Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness issue with MHFromPrior sampler #250

Open
miguelbiron opened this issue Dec 7, 2024 · 5 comments
Open

Correctness issue with MHFromPrior sampler #250

miguelbiron opened this issue Dec 7, 2024 · 5 comments

Comments

@miguelbiron
Copy link
Collaborator

Hi @sunxd3 -- I was testing the MHFromPrior within Gibbs sampler on one of Pigeons' recurring toy examples, and I found that it doesn't target the correct posterior. The posterior is described in depth here. In particular, E[p] ≈ [0.7, 0.7], but the Gibbs sampler produces E[p] ≈ [0.5, 0.5], indicating that it is just sampling from the prior.

The issue, I think, is here

values_proposed, logp_proposed = evaluate!!(rng, cond_model)

You are assuming that logp_proposed is the correct logpdf of the proposal. But the issue is that evaluate!!(model, rng) resamples every ~ statement---even the ones that correspond to observations. So logp_proposed is computed under a random n_heads sampled from the likelihood, and is therefore unrelated to the actual observed n_heads. Which implies that the sampler ends up targeting the prior instead of the posterior.

Reproducer

using JuliaBUGS
using Random
using Distributions
using AbstractMCMC
using MCMCChains

# unidentifiable coin-toss model
unid_model_def = @bugs begin
    for i in 1:2
        p[i] ~ dunif(0,1)
    end
    p_prod = p[1]*p[2]
    n_heads ~ dbin(p_prod, n_flips)
end
unid_target_model = compile(unid_model_def, (; n_heads=50000, n_flips=100000))
n_samples = 2000
rng = MersenneTwister(123)
samples_and_stats = AbstractMCMC.sample(
    rng,
    AbstractMCMC.LogDensityModel(unid_target_model),
    JuliaBUGS.Gibbs(unid_target_model, JuliaBUGS.MHFromPrior()),
    n_samples;
    discard_initial=n_samples÷2
)

Result

Chains MCMC chain (2000×2×1 Array{Real, 3}):

Iterations        = 1001:1:3000
Number of chains  = 1
Samples per chain = 2000
parameters        = p[2], p[1]
internals         = 

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64        Real     Float64   Float64       Missing 

        p[2]    0.5068    0.2884    0.0066   1948.4995   1923.9786    0.9999       missing
        p[1]    0.5124    0.2906    0.0066   1972.0720   1962.4925    0.9997       missing

The issue persists if we sample in constrained space

unid_target_model_constrained = JuliaBUGS.settrans(unid_target_model)
samples_and_stats = AbstractMCMC.sample(
    rng,
    AbstractMCMC.LogDensityModel(unid_target_model_constrained),
    JuliaBUGS.Gibbs(unid_target_model_constrained, JuliaBUGS.MHFromPrior()),
    n_samples;
    discard_initial=n_samples÷2
)

Results

Chains MCMC chain (2000×2×1 Array{Real, 3}):

Iterations        = 1001:1:3000
Number of chains  = 1
Samples per chain = 2000
parameters        = p[2], p[1]
internals         = 

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64        Real     Float64   Float64       Missing 

        p[2]    0.4926    0.2869    0.0063   2076.9384   2001.8582    0.9995       missing
        p[1]    0.4980    0.2871    0.0067   1856.3094   2002.1115    1.0001       missing

@sunxd3
Copy link
Member

sunxd3 commented Dec 9, 2024

@miguelbiron thanks for the great catch! this is pretty sloppy -- sorry, let me get on fixing this

@miguelbiron
Copy link
Collaborator Author

No problem!

I wonder if, instead of the Gibbs sampler, the problem lies instead in the fact that evaluate!! resamples every tilde statement, including observations. Is that intended? Seems like it could lead to other issues, not just with this sampler.

OTOH, that feature enables some relevant algorithms. For example, the MCMC invariance statistical test we use in Pigeons.

@sunxd3
Copy link
Member

sunxd3 commented Dec 9, 2024

I think so (regarding issue lies in evaluate!!). It was meant to just sample from prior, but I was careless when I used it for Gibbs.

I think adding an option to not sample the conditioned variables is good. Actually not sample is probably the correct default implementation -- sampling from prior can be just first decondition on all variables and then run this evaluate function (one that does not sample from conditioned variables).

@miguelbiron
Copy link
Collaborator Author

That sounds reasonable! The situations where you actually want full resampling are pretty rare.

@sunxd3
Copy link
Member

sunxd3 commented Dec 11, 2024

#253 introduced the said option for not sampling observed variables

Will start a fix to MHFromPrior soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants