-
Notifications
You must be signed in to change notification settings - Fork 17
Varying Slopes #1
Comments
Hi Josh, unfortunately I can't help you with this. I believe last year when Richard worked on the Turing versions of the models the LKJ Correlation distribution was not available. I'm not sure if that has changed since then. My plan is to revisit TuringModels.jl and update the models to AdvancedHMC after I have completed updating StatisticalRethinking.jl to the 2nd edition of the book. The SR update will definitely take a chunk of time (I expect until early next year and this delays the revision of TuringModels.jl). For this specific model/example, you could check with the Turing team, e.g. on Slack, as they are always very helpful. |
Thanks for the response. Yes, I noticed there was no LKJ distribution available in Turing and ended up asking about it in the Slack channel. They recommended opening an issue on the Turing repo which I plan on doing soon. I had assumed it may be possible to get a similar solution by using a different prior on the correlations. I've seen the Inverse Wishart being used here and I believe the Beta and Truncated Uniform being used here. I wasn't sure how to apply it to the examples in Rethinking though. Good luck with the 2ed updates! |
Hi Chris ( @itsdfish ), have you come across this and maybe used a different distribution for this in Turing? |
I have not used a multilevel model in Turing that specifies the relationship between coefficients. However, I believe Tamas has a version of the LKJ distribution here. In principle, it should interface with Turing without any modification. |
Thanks Chris, I tested it out with a simple example and Turing didn't like that the
I'll spend some more time with it though. I plan on opening an issue on the Turing repo and I'll definitely reference the AltDistributions package. |
Ah I see. It looks like the problem is that If you don't need a rand function, a quick fix might be something like this:
|
I had to make a couple modifications to your code. Just bringing in a few more packages: using ArgCheck, Parameters, Distributions, LinearAlgebra
import Distributions.logpdf
import LinearAlgebra: AbstractTriangular
struct LKJL{T <: Real} <: ContinuousMultivariateDistribution
η::T
function LKJL(η::T) where T <: Real
@argcheck η > 0
new{T}(η)
end
end
function logpdf(d::LKJL, L::Union{AbstractTriangular, Diagonal})
@unpack η = d
z = diag(L)
n = size(L, 1)
sum(log.(z) .* ((n:-1:1) .+ 2*(η-1))) + log(2) * n
end But still ended up getting hit with another error when I went to sample:
So maybe another method needs to be added for this to work. |
Would you be able to post a MWE? If so, I'll take a look. |
Sure, I've just modified this example I saw in another Turing Github issue. I'll post what I have so far using your suggestions: using Turing, Random
Random.seed!(1);
# define the LKJ distribution
using ArgCheck, Parameters, Distributions, LinearAlgebra
import Distributions.logpdf
import LinearAlgebra: AbstractTriangular
struct LKJL{T <: Real} <: ContinuousMultivariateDistribution
η::T
function LKJL(η::T) where T <: Real
@argcheck η > 0
new{T}(η)
end
end
function logpdf(d::LKJL, L::Union{AbstractTriangular, Diagonal})
@unpack η = d
z = diag(L)
n = size(L, 1)
sum(log.(z) .* ((n:-1:1) .+ 2*(η-1))) + log(2) * n
end
# Specify Turing model.
@model correlation(x, N) = begin
# Create mu variables.
mu = TArray{Any}(2)
mu[1] ~ TruncatedNormal(0, 100, -10, 10)
mu[2] ~ TruncatedNormal(0, 100, -10, 10)
# Create sigma variables.
sigma = TArray{Any}(2)
sigma[1] ~ TruncatedNormal(0, 100, 0, 10)
sigma[2] ~ TruncatedNormal(0, 100, 0, 10)
# Create rho.
rho ~ LKJL(2.0)
# Generate covariance matrix.
p = sigma[1]*sigma[2]*rho
cov = [sigma[1]^2 p;
p sigma[2]^2]
# Iterate through each datapoint, and observe its likelihood.
for i in 1:N
x[i,:] ~ MvNormal([mu[1], mu[2]], cov)
end
end
# Helper function to generate a covariance matrix.
function tcovar(sigma, rho)
p = sigma[1]*sigma[2]*rho
return [sigma[1]^2 p;
p sigma[2]^2]
end
# Number of datapoints to generate.
N = 1000
# Target covariance matrix.
target_covariance = tcovar([1, 10], 0.5)
# Generate random data.
target_dist = MvNormal([0.0, 0.0], target_covariance)
df = rand(target_dist, N)'
# Sample using HMC.
chain = sample(correlation(df, N), HMC(0.01, 5), 1000) Note that I'm using the master branch of Turing. So the |
Hmmm. I'm not very familiar with KJL distributions, but it seems like there might be a type conflict. rho appears to be a scalar, but KJL is expecting an AbstractTriangular or Diagonal, both of which are types of matrices (I think). I'm not sure how you would set something up like that in Turing. This might be more complex than I thought. It might be worth reporting the example above along with the issue you opened. Sorry I couldn't be of more help! |
No problem, I appreciate the help though. Thanks! |
I'll have a look at it. @goedman thanks for pointing me to this. |
Hey @trappmartin, I just noticed that the LKJ distribution is now in |
Cool, could you post an example here or open an issue on Turing.jl please. I haven't had time to look into it. Too many things going on atm. |
Sure thing, here's my adjusted version with the new using Turing, Distributions, LinearAlgebra
@model correlation(x, N) = begin
# Create mu variables.
mu = TArray{Any}(2)
mu[1] ~ truncated(Normal(0, 100), -10, 10)
mu[2] ~ truncated(Normal(0, 100), -10, 10)
# Create sigma variables.
sigma = TArray{Any}(2)
sigma[1] ~ truncated(Normal(0, 100), 0, 10)
sigma[2] ~ truncated(Normal(0, 100), 0, 10)
# prior on the correlation matrix
rho ~ LKJ(2, 2)
# Generate covariance matrix.
cov = [sigma[1] 0; 0 sigma[2]] * rho * [sigma[1] 0; 0 sigma[2]]
# Iterate through each datapoint, and observe its likelihood.
for i in 1:N
x[i,:] ~ MvNormal([mu[1], mu[2]], cov)
end
end
# Helper function to generate a covariance matrix.
function tcovar(sigma, rho)
p = sigma[1]*sigma[2]*rho
return [sigma[1]^2 p;
p sigma[2]^2]
end
# Number of datapoints to generate.
N = 1000
# Target covariance matrix.
target_covariance = tcovar([1, 10], 0.5)
# Generate random data.
target_dist = MvNormal([0.0, 0.0], target_covariance)
df = rand(target_dist, N)'
# Sample
chain = sample(correlation(df, N), NUTS(), 1000) With this example I'm currently seeing the following error message:
Does the
Any help would be appreciated. |
Just wanted to follow up here. I've opened an issue on Bijectors.jl: As soon as the LKJ is usable in Turing I'll take a stab at a Turing version of this Rethinking model. |
That would be great, particularly now we're seriously looking into a Turing version of StatisticalRethinking. |
This: using Turing
using LinearAlgebra
@model function correlation(y)
N,D = size(y)
mu ~ filldist(truncated(Normal(0, 100), -10, 10), D)
sigma ~ filldist(truncated(Normal(0, 100), 0, 10), D)
# prior on the correlation matrix
rho ~ LKJ(D,D)
L = Diagonal(sigma) * rho
# Iterate through each datapoint, and observe its likelihood.
for i in 1:N
y[i,:] ~ MvNormal(mu , L*L')
end
return L*L'
end
cov = [ 4.94072998 -4.93536067; -4.93536067 5.99552455 ]
x = rand(MvNormal([0.0, 1.0], cov), 10)
model = correlation(x')
sample(model, HMC(0.01,5), 1000) seems to work out of the box. |
I guess it will work a bit better (higher ESS) if you use the bijection described by Mohamed in TuringLang/Bijectors.jl#108. https://gist.github.com/trappmartin/a8ceb7e7b1ce20ac737ab599da432469 |
Thanks for the example @trappmartin. I was able to run through your example but noticed the following error if I switched the sampler to
|
Did you observe the same when using the bijection in the gist? |
Yes, unfortunately, it's the same with the bijection and
|
I found these docs for the PyMC3 implementation:
Does L in the example above need to be lower-triangular? |
The Cholesky decomposition of a covariance matrix can be written as a product of a lower triangular matrix with its transpose or an upper triangular and its transpose. In short, it can be rewritten using an upper triangular matrix. I think my code example might actually be wrong. I should have double checked the LKJ distribution before. |
Thanks for all the help in investigating this btw. So I have to admit that I'm not too well versed in matrix algebra and I definitely need to educate myself some more on that topic. In seeing how you decided to construct the covariance matrix as data {
int<lower=1> N; // number of observations
int<lower=1> J; // dimension of observations
vector[J] y[N]; // observations
vector[J] Zero; // a vector of Zeros (fixed means of observations)
}
parameters {
cholesky_factor_corr[J] Lcorr;
vector<lower=0>[J] sigma;
}
model {
y ~ multi_normal_cholesky(Zero, diag_pre_multiply(sigma, Lcorr));
sigma ~ cauchy(0, 5);
Lcorr ~ lkj_corr_cholesky(1);
}
generated quantities {
matrix[J,J] Omega;
matrix[J,J] Sigma;
Omega <- multiply_lower_tri_self_transpose(Lcorr);
Sigma <- quad_form_diag(Omega, sigma);
} Where your using Turing
using Bijectors
using LinearAlgebra
Bijectors.bijector(d::LKJ) = Bijectors.PDBijector()
@model function correlation(y)
N,D = size(y)
mu ~ filldist(truncated(Normal(0, 100), -10, 10), D)
sigma ~ filldist(truncated(Normal(0, 100), 0, 10), D)
# prior on the correlation matrix
rho ~ LKJ(D,D)
L = Diagonal(sigma) * cholesky(rho).L # <- only change is here
# Iterate through each datapoint, and observe its likelihood.
for i in 1:N
y[i,:] ~ MvNormal(mu , L*L')
end
return L*L'
end
cov = [ 4.94072998 -4.93536067; -4.93536067 5.99552455 ]
x = rand(MvNormal([0.0, 1.0], cov), 100)
model = correlation(x')
sample(model, HMC(0.01,5), 1000) This samples but unfortunately has not so great results like before:
|
Another odd thing I'm seeing here is that it appears that the sample(model, Prior(), 2000)
The diagonal on the correlation matrix sample(model, MH(), 2000)
At least it's getting the diagonal right, however, still very low I'd like to consolidate this issue to one place and not have it spread across repos. However, at this point I'm uncertain of what's the culprit: Turing, Bijectors, AdvancedHMC, DynamicPPL, ForwardDiff, etc. What would you recommend? |
I think it’s mostly an issue related to the constraints of LKJ. And an issue on Turing or Bijectors is the best place for it. |
Decided to close and continue discussion on Once the |
Hi,
Thanks for putting this work out in the open. I just recently finished McElreath's excellent book and was attempting to translate some of the models over to Julia. I am having a ton of trouble figuring out how to specify a multilevel model in Turing with the correct Multivariate Normal prior for the varying intercepts and slopes like in the cafes model
m13.1
in the book.Do you happen to have any examples of a varying intercept/slope model that uses the MvNormal prior?
Thanks again!
The text was updated successfully, but these errors were encountered: