Covariance conditioned on undirected graph #2067

BA1437 · 2023-08-26T02:27:48Z

BA1437
Aug 26, 2023

I'm working on a Turing model for fitting a covariance matrix and have been following along with the threads here and here. In my use case, I'm working with an inverse covariance matrix, Θ, conditioned on an undirected graph. Θ[i,j]=0 if node i and j are conditionally independent, i.e. if there is no edge connecting vertex i and j in the undirected graph. I'm sampling the prior from LKJ(), then zeroing the elements that don't share an edge in the undirected graph, but this leading to non-positive definite matrices, PosDefException: matrix is not positive definite; Cholesky factorization failed.

PositiveFactorizations doesn't seem to be an appropriate solution here because it perturbs the zero entries of Θ.
Any thoughts on work-arounds would be greatly appreciated.

# Specify Turing model.
@model correlation(x,C,N,j) = begin
    

    Omega ~ LKJ(j, 1) # LKJ prior on the correlation matrix

    Omega[C.==0].=0.
    
    Θ = PDMat(Omega) #This is the line causing the error

    # Iterate through each datapoint, and observe its likelihood.
    for i in 1:N
        x[i,:] ~ MvNormalCanon(Θ)
    end
end

#Initial precision matrix
S = [0.12   -0.01  0.0   -0.05
     -0.01   0.10  -0.02  0.0
     0.0     -0.02  0.11  -0.03
     -0.05   0.0   -0.03   0.13]

#undirected graph
conmat = [1 1 0 1;
        1 1 1 0;
        0 1 1 1;
        1 0 1 1];

D = MvNormalCanon(S)
N = 1000
data = rand(D,N)'
chain = sample(correlation(data,conmat, N,4), HMC(0.01, 5),1000)

torfjelde · 2023-09-01T19:02:26Z

torfjelde
Sep 1, 2023
Maintainer

Have you tried using a LKJCholesky for the parameterization instead? Then Θ = PDMat(Omega) should work without issues.

5 replies

BA1437 Sep 5, 2023
Author

If I were to use LKJCholesky, I'm not sure how I would do the conditioning. It wouldn't work to just set elements of the cholesky decomposition of Θ to zero based on the zero elements in the undirected graph since they don't occur in the same locations.

I guess the fundamental question is, how does one generate random positive definite matrices where some elements are locked at zero?

BA1437 Sep 6, 2023
Author

Essentially an analog to the rgwish function in R or equivalent for the LKJ dist. https://search.r-project.org/CRAN/refmans/BDgraph/html/rgwish.html

torfjelde Sep 14, 2023
Maintainer

Ah sorry! That was a bad suggestion 🤦 I didn't read your questions properly on the first read. I now get what you're asking 👍

how does one generate random positive definite matrices where some elements are locked at zero?

For MCMC (or rather, MH-based MCMC methods), we generally only require the ability to compute the log-pdf, not necessarily sample. So we can achieve this, then we're good.

From a brief look at the link you point to, it seems as if this is possible, but it just needs to be implemented.

Also, would it be possible to just perform a conditional decomposition here by hand, and just model the conditional variances directly?

Maybe some of @yebai @devmotion @sethaxen are more familiar with this

torfjelde Sep 14, 2023
Maintainer

Also, would it be possible to just perform a conditional decomposition here by hand, and just model the conditional variances directly?

What I mean is that if you have

S = [
    Ã  C̃;
    C̃' B̃
]

where C̃ is a diagonal, can you not do

Ã ~ LKJ(2, 1)
B̃ ~ LKJ(2, 1)

C̃_diag ~ filldist(InverseGamma(2, 3), 2)
C̃ = Diagonal(C̃_diag)

S = [
    Ã  C̃;
    C̃' B̃
]

(note that here I've swapped x[3] and x[4] so that C̃ is a diagonal matrix rather than off-diagonal).

Or something along these lines.

EDIT: Nah, the above won't work. C̃ requires more/different restrictions

sethaxen Sep 14, 2023
Collaborator

Haven't read this carefully yet, but any valid Cholesky factor produces a valid positive definite correlation matrix. The columns/rows of the factor live on the unit hemisphere. The constraints on the I,j elements of the correlation matrix being 0 are constraints that the corresponding rows/columns of the factor are orthogonal.

It's possible to generate such a Cholesky factor with a modification of our current bijector. I'd need to refresh my memory what the reference measure is for LKJ to know if that would also need an update with this constraint.

sethaxen · 2023-09-14T14:38:47Z

sethaxen
Sep 14, 2023
Collaborator

I believe the reference measure for LKJ is the Lebesgue measure on strict upper/lower triangle of the correlation matrix, so with these constraints the sensible reference measure is the Lebesgue measure on the same elements that are not constrained to be zero. For LKJCholesky, the reference measure is the (hemi-)spherical measure on the rows/columns of the factor. I think the sensible reference measure with these constraints would be more similar to the invariant measure on orthogonal matrices, but we would have to be careful defining it to know what the correct Jacobian correction should be.

4 replies

torfjelde Sep 14, 2023
Maintainer

And we can easily generalize the transform for LKJ to the zero-constrained one?

sethaxen Sep 14, 2023
Collaborator

Do you mean the bijector for Cholesky factors of correlation matrices? That should be generalizeable, but I can't be certain how easy it will be without doing some math first.

But before going there, @BA1437, it would be nice to know more about your application. How sparse is your PD matrix? If very sparse, then other approaches may be more suitable (e.g. something related to Cholesky factorization with permutation matrices).

BA1437 Sep 14, 2023
Author

Hi @sethaxen, The typical matrix I work with is 10x10, rarely above 20x20, and is approximately 30-40% non-zeros entries.

sethaxen Sep 14, 2023
Collaborator

Okay, yeah, then this approach is probably appropriate. I'll try to take a look at the math over the weekend.

sethaxen · 2023-09-16T20:36:04Z

sethaxen
Sep 16, 2023
Collaborator

Here's what I have so far: https://gist.github.com/sethaxen/0b1c9cf92cbea99035a0261e16921a69

It works by iteratively satisfying the constraints. For each column $j$, each column $i$ for $i < j$ that is orthogonal to column $j$ decreases the dimensionality of the hemisphere by 1. So we simply construct a point on a lower dimensional hemisphere and use the QR decomposition of the orthogonal columns to lift it to a point on a higher dimensional hemisphere.

Quick demo:

julia> b = VecCholeskyCorrBijectorWithZeros([(4, 3), (5, 3), (6, 1), (7, 2), (10, 8)]);

julia> y = randn(Bijectors.output_size(b, (10, 10)));

julia> x = inverse(b)(y);

julia> isposdef(x)
true

julia> Matrix(x)
10×10 Matrix{Float64}:
  1.0        -0.0907855     0.998123      0.0309052     0.0382051     0.0        -0.436952      0.215967      0.983914    0.301468
 -0.0907855   1.0          -0.138372      0.522425      0.805116      0.0598142  -8.89604e-17  -0.717        -0.258497    0.404107
  0.998123   -0.138372      1.0           4.32063e-18   5.69099e-18   0.0248355  -0.461901      0.243942      0.991967    0.251734
  0.0309052   0.522425      4.32063e-18   1.0           0.455836      0.240887   -0.0552835    -0.629001     -0.0816327   0.0239458
  0.0382051   0.805116      5.69099e-18   0.455836      1.0          -0.0195799   0.180512     -0.330956     -0.102061    0.300346
  0.0         0.0598142     0.0248355     0.240887     -0.0195799     1.0        -0.784509     -0.54447       0.0285604  -0.687657
 -0.436952   -8.89604e-17  -0.461901     -0.0552835     0.180512     -0.784509    1.0           0.361692     -0.458932    0.438751
  0.215967   -0.717         0.243942     -0.629001     -0.330956     -0.54447     0.361692      1.0           0.322642   -8.18197e-17
  0.983914   -0.258497      0.991967     -0.0816327    -0.102061      0.0285604  -0.458932      0.322642      1.0         0.194298
  0.301468    0.404107      0.251734      0.0239458     0.300346     -0.687657    0.438751     -8.18197e-17   0.194298    1.0

And sure, this is a straightforward generalization of VecCholeskyBijector:

julia> b = VecCholeskyCorrBijectorWithZeros(Tuple{Int,Int}[]);

julia> b2 = Bijectors.VecCholeskyBijector(:U);

julia> y = randn(Bijectors.output_size(b, (10, 10)));

julia> inverse(b)(y).U ≈ inverse(b2)(y).U
true

To-do:

verify that the non-uniqueness of the QR decomposition won't create discontinuities.
work out the Jacobian determinant. Applying the QR decomposition causes all entries in column $i$ of Matrix(x) to depend on all entries in column $i$ of $y$, so the Jacobian all the way to a correlation matrix is now block-triangular instead of triangular. (see below)
write out the forward transform.
work out if we can reuse any previous work to avoid performing many QR decompositions.

Here's the Jacobian:

julia> using ForwardDiff, Plots

julia> J = ForwardDiff.jacobian(y) do y
           x = inverse(b)(y)
           R = Matrix(x)
           z = similar(y)
           k = 1
           for j in axes(R, 2), i in 1:(j - 1)
               if (i, j) ∈ b.zero_inds || (j, i) ∈ b.zero_inds
                   continue
               end
               z[k] = R[i, j]
               k += 1
           end
           return z
       end;

julia> heatmap((iszero.(J))[reverse(axes(J, 1)), :]; colormap=:grays, colorbar=false, aspect_ratio=1, size=(200, 200), dpi=600)

0 replies

sethaxen · 2023-10-09T23:18:42Z

sethaxen
Oct 9, 2023
Collaborator

So it turns out the Jacobian determinant is a very simple modification to the standard Jacobian determinant, requiring only that one divides out the determinants of the R factors in the QR decompositions. I've updated the gist at https://gist.github.com/sethaxen/0b1c9cf92cbea99035a0261e16921a69 to include this. Here's a quick demo of how to use this code in Turing:

julia> using Turing

julia> @model function foo(N, η, b, M = Bijectors.output_size(b, (N, N))[1])
           y ~ filldist(Flat(), M)
           Rchol, logJ = with_logabsdet_jacobian(inverse(b), y)
           Turing.@addlogprob! logJ + logpdf(LKJCholesky(N, η), Rchol)
           return (; R=Matrix(Rchol))
       end;

julia> b = VecCholeskyCorrBijectorWithZeros([(4, 3), (5, 3), (6, 2), (6, 4), (7, 2), (10, 8)]);

julia> model = foo(10, 1.0, b);

julia> chns = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×51×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 2.58 seconds
Compute duration  = 8.33 seconds
parameters        = y[1], y[2], y[3], y[4], y[5], y[6], y[7], y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15], y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23], y[24], y[25], y[26], y[27], y[28], y[29], y[30], y[31], y[32], y[33], y[34], y[35], y[36], y[37], y[38], y[39]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse     ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64      Float64     Float64   Float64       Float64 

        y[1]   -0.0014    0.3262    0.0036    8306.1741   2650.6651    1.0046      997.4990
        y[2]    0.0065    0.3370    0.0035    9004.9886   2239.7985    1.0030     1081.4205
        y[3]   -0.0000    0.3573    0.0036    9817.5543   2616.6600    1.0022     1179.0026
        y[4]   -0.0128    0.3851    0.0040    9337.8045   2719.1003    1.0018     1121.3888
        y[5]   -0.0010    0.3858    0.0040    9292.1477   2667.5592    1.0002     1115.9058
        y[6]   -0.0069    0.3661    0.0038    9555.6746   2645.6843    1.0031     1147.5531
        y[7]    0.0047    0.3740    0.0041    8427.9599   2314.7639    1.0018     1012.1244
        y[8]   -0.0023    0.3961    0.0040    9777.3013   3027.9953    1.0008     1174.1685
        y[9]    0.0019    0.3752    0.0041    8329.9344   2817.7893    1.0010     1000.3524
       y[10]    0.0013    0.3996    0.0043    8699.3030   2772.0395    1.0003     1044.7103
...

julia> Rs = first.(generated_quantities(model, chns));

julia> Rs_arr = permutedims(stack(Rs), (3, 4, 1, 2));

julia> dropdims(mean(Rs_arr; dims=(1, 2)); dims=(1, 2))
10×10 Matrix{Float64}:
  1.0          -0.000937718   0.000716397   0.00131169   -0.0112337    -0.00768082   -0.00955092    0.00433007   -0.000583165  -0.00612702
 -0.000937718   1.0           0.00477415   -0.00758961   -0.00111306    1.40106e-18  -7.06457e-19   0.00305375    0.000326224   0.000141046
  0.000716397   0.00477415    1.0          -3.49966e-19   2.00646e-18  -0.00141633   -0.000750534   0.00383163   -0.0052103     0.00375098
  0.00131169   -0.00758961   -3.49966e-19   1.0          -0.00740308    2.88631e-19   0.00290981    0.000568475   0.00339934    0.00365722
 -0.0112337    -0.00111306    2.00646e-18  -0.00740308    1.0          -0.00411793   -0.00111534   -0.00342769   -0.00271073    0.00231426
 -0.00768082    1.40106e-18  -0.00141633    2.88631e-19  -0.00411793    1.0           0.000787432  -0.00914072   -0.00736483   -0.00295641
 -0.00955092   -7.06457e-19  -0.000750534   0.00290981   -0.00111534    0.000787432   1.0          -0.00597936    0.00280843   -0.00632741
  0.00433007    0.00305375    0.00383163    0.000568475  -0.00342769   -0.00914072   -0.00597936    1.0          -0.000178056  -1.17235e-19
 -0.000583165   0.000326224  -0.0052103     0.00339934   -0.00271073   -0.00736483    0.00280843   -0.000178056   1.0           0.000328028
 -0.00612702    0.000141046   0.00375098    0.00365722    0.00231426   -0.00295641   -0.00632741   -1.17235e-19   0.000328028   1.0

julia> dropdims(std(Rs_arr; dims=(1, 2)); dims=(1, 2))
10×10 Matrix{Float64}:
 0.0       0.297735     0.295122     0.305087     0.314443     0.30063      0.298835    0.297618     0.303849     0.299865
 0.297735  7.32965e-17  0.28859      0.313681     0.309783     6.41493e-17  6.1412e-17  0.295638     0.293659     0.303158
 0.295122  0.28859      1.23095e-16  6.41305e-17  6.68502e-17  0.318834     0.300889    0.303083     0.298644     0.297913
 0.305087  0.313681     6.41305e-17  1.28809e-16  0.322199     6.30431e-17  0.301314    0.303051     0.300869     0.300524
 0.314443  0.309783     6.68502e-17  0.322199     1.52498e-16  0.301399     0.290621    0.297827     0.298588     0.300076
 0.30063   6.41493e-17  0.318834     6.30431e-17  0.301399     1.54036e-16  0.315676    0.305576     0.30645      0.303368
 0.298835  6.1412e-17   0.300889     0.301314     0.290621     0.315676     1.8129e-16  0.302016     0.303771     0.298926
 0.297618  0.295638     0.303083     0.303051     0.297827     0.305576     0.302016    1.99332e-16  0.299137     6.72064e-17
 0.303849  0.293659     0.298644     0.300869     0.298588     0.30645      0.303771    0.299137     2.08847e-16  0.296491
 0.299865  0.303158     0.297913     0.300524     0.300076     0.303368     0.298926    6.72064e-17  0.296491     2.08714e-16

julia> ess(Rs_arr)
10×10 Matrix{Float64}:
   NaN     9907.95  11237.6   2145.82  2093.74  4569.24  2003.48  9540.87  11305.5   3299.59
  9907.95  4227.96   6579.85  9770.15  9294.1   3826.64  4053.54  6731.9    6292.09  8570.7
 11237.6   6579.85   3842.25  3930.91  3927.51  7574.65  9761.37  5520.39   5772.56  5799.75
  2145.82  9770.15   3930.91  4146.77  4391.51  3501.25  8235.1   6993.72   6392.95  6221.25
  2093.74  9294.1    3927.51  4391.51  4164.1   7755.78  4670.25  3878.4    4921.05  4083.3
  4569.24  3826.64   7574.65  3501.25  7755.78  4070.78  4844.72  3868.94   4155.53  4241.81
  2003.48  4053.54   9761.37  8235.1   4670.25  4844.72  3879.63  3187.78   3069.21  3108.06
  9540.87  6731.9    5520.39  6993.72  3878.4   3868.94  3187.78  3668.2    2537.59  3821.87
 11305.5   6292.09   5772.56  6392.95  4921.05  4155.53  3069.21  2537.59   3844.7   2471.25
  3299.59  8570.7    5799.75  6221.25  4083.3   4241.81  3108.06  3821.87   2471.25  3862.0

julia> rhat(Rs_arr)
10×10 Matrix{Float64}:
 NaN        1.00113   1.00242   1.00069   1.00204  1.00113   1.0028    1.00227   1.00045  1.00066
   1.00113  0.999754  1.00207   1.00176   1.00015  1.00051   1.00159   1.0004    1.00027  1.00152
   1.00242  1.00207   1.001     1.00051   1.00013  1.00187   0.999544  1.00127   1.00014  1.00062
   1.00069  1.00176   1.00051   0.999743  1.00143  1.00227   1.00114   1.00034   1.00035  1.00177
   1.00204  1.00015   1.00013   1.00143   1.00062  1.00152   1.00262   1.00085   1.0006   1.00022
   1.00113  1.00051   1.00187   1.00227   1.00152  0.999803  1.00058   1.00088   1.00099  1.00254
   1.0028   1.00159   0.999544  1.00114   1.00262  1.00058   0.999535  1.00103   1.00037  0.999657
   1.00227  1.0004    1.00127   1.00034   1.00085  1.00088   1.00103   0.999928  1.00093  1.00001
   1.00045  1.00027   1.00014   1.00035   1.0006   1.00099   1.00037   1.00093   1.00115  1.00064
   1.00066  1.00152   1.00062   1.00177   1.00022  1.00254   0.999657  1.00001   1.00064  1.00061

If this is sufficiently useful, we might consider adding this directly to Bijectors (would need to work out the unconstraining transform also). Note, however, that still the bijector would need to directly be used, unless one had a way to override LKJCholesky's default bijector to specify this one be used. @torfjelde this in some way relates to our Slack discussion.

2 replies

torfjelde Oct 10, 2023
Maintainer

This is really cool @sethaxen . Like, really, really cool. Do you have a full write-up of this somewhere?:) I'm not too familiar with the LKJ transform, so can't quite see how it all fits together atm.

And yeah, something like what we discussed in Slack would be very much welcome!

In the meantime, one can make use of @sethaxen 's work above in any other model through @submodel, e.g.

@model function outer_model()
    @submodel (R,) = foo(...)
end

sethaxen Oct 10, 2023
Collaborator

This is really cool @sethaxen . Like, really, really cool.

Thanks! There are more optimizations possible, but they involve iteratively performing the QR decomposition, and it would be good to first estimate the complexity of this naive approach.

Do you have a full write-up of this somewhere?:) I'm not too familiar with the LKJ transform, so can't quite see how it all fits together atm.

Nothing here is LKJ-specific. Rather, it's specific to correlation matrices. One could just as easily use a WishartCholesky distribution (if it exists) to restrict the Wishart distribution to correlation matrices with structural zeros.

I have a messy Overleaf write-up I can share after some tidying up. ~~Or, I guess it could be a blog post~~.

And yeah, something like what we discussed in Slack would be very much welcome!

Cool! In a few weeks I'll return to that and try to figure out why it doesn't work.

Covariance conditioned on undirected graph #2067

Uh oh!

BA1437 Aug 26, 2023

Replies: 4 comments · 11 replies

Uh oh!

torfjelde Sep 1, 2023 Maintainer

Uh oh!

BA1437 Sep 5, 2023 Author

Uh oh!

BA1437 Sep 6, 2023 Author

Uh oh!

Uh oh!

torfjelde Sep 14, 2023 Maintainer

Uh oh!

Uh oh!

torfjelde Sep 14, 2023 Maintainer

Uh oh!

sethaxen Sep 14, 2023 Collaborator

Uh oh!

sethaxen Sep 14, 2023 Collaborator

Uh oh!

torfjelde Sep 14, 2023 Maintainer

Uh oh!

sethaxen Sep 14, 2023 Collaborator

Uh oh!

BA1437 Sep 14, 2023 Author

Uh oh!

sethaxen Sep 14, 2023 Collaborator

Uh oh!

Uh oh!

sethaxen Sep 16, 2023 Collaborator

Uh oh!

Uh oh!

sethaxen Oct 9, 2023 Collaborator

Uh oh!

torfjelde Oct 10, 2023 Maintainer

Uh oh!

Uh oh!

sethaxen Oct 10, 2023 Collaborator

BA1437
Aug 26, 2023

Replies: 4 comments 11 replies

torfjelde
Sep 1, 2023
Maintainer

BA1437 Sep 5, 2023
Author

BA1437 Sep 6, 2023
Author

torfjelde Sep 14, 2023
Maintainer

torfjelde Sep 14, 2023
Maintainer

sethaxen Sep 14, 2023
Collaborator

sethaxen
Sep 14, 2023
Collaborator

torfjelde Sep 14, 2023
Maintainer

sethaxen Sep 14, 2023
Collaborator

BA1437 Sep 14, 2023
Author

sethaxen Sep 14, 2023
Collaborator

sethaxen
Sep 16, 2023
Collaborator

sethaxen
Oct 9, 2023
Collaborator

torfjelde Oct 10, 2023
Maintainer

sethaxen Oct 10, 2023
Collaborator