-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LKJ distribution for cholesky factor of correlation matrix #1336
Comments
Isn't this already possible? |
Yes, you're right. But right now, if a user has using PDMats
MvNormal(μ, PDMat(Cholesky(σ .* L.data, 'L', 0))) This isn't ideal for several reasons. First, it requires the user to specify args for A new |
Yes, I think this would be the better approach since it is more general and hence could be used also for other One also has to think about the |
Yes, these are good points.
My idea was just to add a distribution whose |
IIRC
What problems do you have in mind? |
At some point I sketched out some types I called Would that be helpful here? Or is the best solution a completely separate type? |
Oh, that is unfortunate. We should definitely confirm this though, because it would be nice if we could use
Yes, but
I'm thinking first a user comes from e.g. Stan, PyMC3, or one of the other popular PPLs and uses an And in fact, while writing this, I'm liking more the idea of an
Good question. I lean towards a completely separate type for simplicity, unless there are ample use cases for these special wrappers. e.g. the only three uses I know of for |
Second this.
We also have
Right. I can see how the wrapper construction might just get in the way in the PPL context. Not that these things are mutually exclusive. Anyhow, I read but did not really understand the Stan source code for sampling LKJ-Cholesky directly (here and here). The comments say "The implementation is Ben Goodrich's Cholesky factor-based approach to the C-vine method of [LKJ (JMA 2009)]." Is there a write-up of this anyplace, or would we just have to port the Stan code? Does that pose a licensing issue? |
If we go this route, we should definitely have something like a
Yeah, I guess my real priority is that there be a simple constructor. But even defining
I think it's just using part of the algorithm in the paper to generate random canonical partial correlations, and then instead of pushing those through a map to a correlation matrix, it pushes them through a map directly to the cholesky factor. I think that's what their |
Can't it just be based on the paper and the documentation to avoid any licensing issues? |
Good to know.
Yeah, that's the bit I don't understand yet. And note that |
The bijective mapping in Bijectors is based on the partial correlations: https://github.com/TuringLang/Bijectors.jl/blob/master/src/bijectors/corr.jl It's documented in the docstrings and based on https://mc-stan.org/docs/2_26/reference-manual/correlation-matrix-transform-section.html (IIRC one formula in the Stan docs was incorrect though). |
Just what the doctor ordered. Thanks! |
Great! I'll try to put together a PR implementing this next week. Also, after conversations with @cscherrer in JuliaMath/MeasureTheory.jl#101 (comment), I think it's worthwhile including an |
Distributions already has an
LKJ(d, η)
whose support isd × d
correlation matrices. More useful for probabilistic programming is the equivalentLKJ
distribution on whose support is the corresponding cholesky factorL
(or equivalentlyU
), because then the cholesky factor can be directly sampled and used, e.g. in the multivariate normal density, without needing to compute the cholesky decomposition.There are
d(d-1)/2
unique elements in the strict lower triangletril₋(R)
of a correlation matrixR=L L^T
. Likewise, there ared(d-1)/2
unique elements in the strict lower triangletril₋(L)
of its cholesky factorL
. The mapϕ: tril₋(R) → tril₋(L)
is bijective, and the log determinant of its Jacobian islogdet(J) = \sum_{i=2}^d (i - d) \log(L_{ii})
. The resultinglogkernel
is only adjusted by-logdet(J)
and works out tologkernel(L) = \sum_{i=2}^d (d - i + 2η - 2)) \log(L_{ii})
.logc0
is unchanged. As a result, we can trivially adapt the code fromLKJ
to define this distribution.It would also be useful to have a convenient constructor for
MvNormal
that could take a mean vector, the cholesky factor of a correlation matrix, and a vector of standard deviations, or alternatively, a mean vector and the cholesky factor of the covariance matrix.Related work
AltDistributions.jl implements this distribution as
LKJL
(however, it pre-datesLKJ
in this package). Likewise, this distribution is defined in several probabilistic programming languages, notably Stan (see documentation and implementation).See also discussions in TuringLang/Turing.jl#1629 (comment) and TuringLang/Bijectors.jl#134 (comment).
The text was updated successfully, but these errors were encountered: