Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVI outputs wrong probabilities #107

Closed
RenatoGeh opened this issue Feb 14, 2022 · 5 comments
Closed

EVI outputs wrong probabilities #107

RenatoGeh opened this issue Feb 14, 2022 · 5 comments

Comments

@RenatoGeh
Copy link
Contributor

Hi,

Here's a MWE (up-to-date with both master branches of PCs and LCs):

# Utility function to create literals
f(x) = PlainProbLiteralNode.(Int32.([-x, x]))
# Bernoulli parameters
W = map(x -> log.(x), [[.6, .4], [.5, .5], [.5, .5], [.8, .2]])
# Bernoulli nodes
A1, A2, B1, B2 = PlainSumNode.(f.([1, 1, 2, 2]), W)
# Products
P1, P2 = PlainMulNode.([[A1, B1], [A2, B2]])
# Root
C = PlainSumNode([P1, P2], log.([.2, .8]))

# Produce all possible assignments with two binary variables
D = DataFrame(all_valuations(1:2), :auto)

# Compute the EVI.
display(exp.(EVI(C, D)))
4-element Vector{Float64}:
 0.0032000000000000015
 0.004799999999999999
 0.012800000000000008
 0.0192

# Probabilities don't add up to one?

# Compute the MAR over the same complete assignments. Should be the same:
display(exp.(MAR(C, D)))
4-element Vector{Float32}:
 0.12
 0.14
 0.35999998
 0.38

# Showing the right probabilities.

Interestingly, from the few examples I tested on my machine, this seems to only affect non-deterministic PCs. Granted, I haven't thoroughly examined the code yet to say this is a certainty. I'll take a look at it now and try to find a fix for this issue.

As a side note, it would perhaps be nice to have some in-code documentation. As it is, the code is very intimidating and kind of obfuscated (for me, at least). Helps with contributing to Juice. :)

Thanks

@khosravipasha
Copy link
Contributor

khosravipasha commented Feb 15, 2022

Thanks for the bug report, interesting we used to have a test to check EVI==MAR for full data. Maybe it was not catching this
scenario. We usually just call MAR even on full data so probably that's why we did not notice.

I will take a closer look into this example.

Also note that we are working on refactoring the library and there is some big changes for version 0.4, and EVI and MAR are being rewritten specially the gpu versions.

As a side note, it would perhaps be nice to have some in-code documentation. As it is, the code is very intimidating and kind of obfuscated (for me, at least). Helps with contributing to Juice. :)

Yesh the new version is less complicated and much cleaner, will add some more docs there :)

@guyvdbroeck
Copy link
Member

I think EVI is not supposed to be run on non deterministic circuits. It's essentially a faster version of MAR for when data and circuit have no latents

@RenatoGeh
Copy link
Contributor Author

Yesh the new version is less complicated and much cleaner, will add some more docs there :)

Good to know! Looking forward to this new version.

I think EVI is not supposed to be run on non deterministic circuits. It's essentially a faster version of MAR for when data and circuit have no latents

I see. Is this set to change after the refactor? If not, perhaps it might be a good idea to add a note referencing this to both the EVI manual entry and EVI docs.

@khosravipasha
Copy link
Contributor

Current implementation of EVI is going to be gone probably, so both MAR and EVI will call the same function loglikelihoods.

@khosravipasha
Copy link
Contributor

The next API for queries in v0.4 will be something like this, for example can compute loglikelihoods as follows (this handles both cases of having missing values (marginals) and having no missing values (EVI)). So both doing either EVI or MAR you can call loglikelihoods. We are still finishing porting some stuff so not everything is done. More documentation and examples for other queries will be out when we release v0.4 (soon hopefully).

CPU version:, we use missing to denote missing values, so the data would be type of Array{Union{Missing, ...}}.

loglikelihoods(pc::ProbCircuit, data::Matrix)

GPU version:
similary type of data is CuArray{Union{Missing, ...}}

loglikelihoods(bpc::CuBitsProbCircuit, data::CuArray; batch_size)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants