-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SamplePSDD #79
Add SamplePSDD #79
Conversation
Thanks for the PR, this should be very useful. I will have some time to review in more details next week or so. For the tests, the "slow tests" is fine and expected to fail, the "unit tests" portion is currently run on CPU, for the GPU tests we run manually using customized github runner. Yesh the stack traces are not that informative, and that's why we have a way of running individual tests |
0b56db0
to
e6b04f8
Compare
Thanks for the tips, Pasha. I added a few more tests, added I/O writing for On the issue of tests, I ran the individual tests and they passed with no issues: In the unit tests workflow, these same tests do pass, but they end up crashing some place else. This same (or at least similar) error seems to be occurring on |
You might want to merge the latest bug fix to master than gets rid of the failing MAP inference test. |
Great, thanks Guy. Rebased master and now tests seem to pass fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
Overall, looks good to me, few comments inside.
-
What's the difference of
BayesModelComb
andEnsemble
? In terms of storage/types they look the same. We also have SharedProbCircuit (but that's slightly different since the structures are shared). Here I assume each ProbCircuit can have different structure right?
Also might want to renameEnsemble
slightly, since its for a specific usecase when we have bdd and logical constraints (unless I read it wrong) -
I need to double check some interations between the threads and gpu later but if it works now then should be fine. One concern is for bigger circuits having lots of threads might lead to memory issues. Can worry about this later I guess.
It was a lot of code so did not take a full pass yet, but overall looks good to me. Will take another pass early next week, then can merge it.
src/ensembles/ensembles.jl
Outdated
|
||
"""Split `X` into two partitions `A` and `B`, where `A` is a Bernoulli sample of each element in | ||
`X` with probability `p` and `B=X∖A`. Guarantees at least one element in `A` and `B`.""" | ||
function bernoulli_partition(X::Vector{Int}, p::Float64)::Tuple{Vector{Int}, Vector{Int}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems general enough, maybe can move it to Utils or something (not a big deal).
https://github.com/Juice-jl/ProbabilisticCircuits.jl/tree/master/src/Utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Will do.
2. `:uniform` for uniform weights; | ||
3. `:em` for Expectation-Maximization; | ||
4. `:stacking` for mixture model Stacking; | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the detailed comments. Also feel free to add link to your paper(s) on functions that are specific to that paper, thinnks this one would be one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UAI is still on preliminaries. But I'll add them once UAI releases the final version. Thanks.
src/ensembles/ensembles.jl
Outdated
end | ||
|
||
"Samples a Vtree with a right bias of `p`. If `p<0`, then uniformly sample vtrees." | ||
function sample_vtree(n::Int, p::Float64)::Vtree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to bernoulli_partition
src/ensembles/ensembles.jl
Outdated
end | ||
|
||
"Returns a(n index) partitioning a la k-fold." | ||
function kfold(n::Int, p::Int)::Vector{Tuple{UnitRange, Vector{Int}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to bernoulli_partition
(i.e. move to utils)
return passdown(U) | ||
end | ||
|
||
"Returns a structured probabilistic circuit compiled from a binary decision diagram." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is going to be very useful :)
this works with any BDD? any restriction to know about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should work with any BDD. I guess a restriction is that it's limited to the BDD.jl implementation: we assume a fixed (arbitrary) ordering, and so the PSDD will compile from a particularly ordered BDD.
return F | ||
end | ||
|
||
"Learns the weights of the Ensemble by Stacking, with `k` as the number of folds in k-fold." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, tacking is using outcome of another model as extra features, but that was mostly for classifiers. Is that also the case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the idea, but in this case we used a density estimation version. Here's the reference paper.
Thanks for the review, Pasha. Answering questions inline.
The difference is that
Yeah.
I kept it
Awesome. :) As a side note, I forgot to add some documentation. I'll write some and push them. I'll mark this WIP until I do. |
Would it be realistic to move the BinaryDecisionDiagrams.jl dependency and related functionality to LogicCircuits.jl? It already has an SDD compiler (which of course can also compile BDDs by fixing a right-linear vtree). I would prefer to keep all purely logical features there. |
Do you mean moving the BDD code into LogicCircuits.jl? Or re-implementing the code using LogicCircuits.jl syntax through the existing types? I see some problems with the latter, since SamplePSDD relies on efficiently reducing, restricting and applying BDDs, which could introduce some overhead if we try to generalize with logic circuits. In the case of the former, I'd personally see no problem in moving BinaryDecisionDiagrams.jl to LogicCircuits.jl. Although one issue would be that the code would be somewhat disconnected from the rest. |
I would prefer to move the BDD code into LogicCircuits.jl. Perhaps in future then we can have some interoperability between your BDD code and the other logic circuit types, some conversions, support for the LogicCircuits.jl compilation API, etc. |
Yeah, I see no problem with that. I'll send the PR to LogicCircuits.jl. |
@RenatoGeh Also the LogicCircuit PR is now pushed and in master. |
Thanks for heads up, Pasha! I did have one issue when writing the docs. I wanted to show how to build a PSDD from a CNF file and ran into #80. Would you rather we deal with this in another PR and try to merge this right now, or try to resolve this in this one? |
@RenatoGeh Yes this seems to be a separate issue, we can merge this one first and do the fix and extra docs in future commits. Thanks. |
e90178a
to
1c8b2e9
Compare
Ok, thanks. Refactored everything to use |
Thanks Renato, will take a look at the tests, probably config issue cause
the tests should also grab logic circuits from master in dev mode. I will
just run them locally if not. We have been meaning to release a new version
anyway.
|
Fixed the tests errors, ran it locally and it passed, will let tests run here too, and will merge sometime next week if there is no other changes needed. |
Codecov Report
@@ Coverage Diff @@
## master #79 +/- ##
==========================================
+ Coverage 67.33% 69.05% +1.71%
==========================================
Files 40 45 +5
Lines 3043 3490 +447
==========================================
+ Hits 2049 2410 +361
- Misses 994 1080 +86
Continue to review full report at Codecov.
|
Alright, awesome. Thanks for that last fix, review and feedback. Just one last thing about the docs. I mention how to compile a PSDD from an SDD, but as we know that's broken for now. Just giving you a heads up in case we have anyone asking about it in the issues. |
Great, thanks. Yeah for the SDD => PSDD, is it that same as the issue #80? Will merge this PR now, if there is anything else we can make extra issues. |
It is.
Sure. |
Hi,
This PR proposes adding the
SamplePSDD
learning algorithm that is to appear at UAI 2021. If you guys think this is a good fit for Juice, I'd be happy to add the code to the library and make any adjustments to comply with Juice notation or anything like that.I wrote some tests, but it seems tests were already failing for me before doing anything to the code. I'm not sure if it's because of my GPU (fails at the parameter estimation GPU tests) or if it's something I introduced in the PR. Even though it's in a completely different section of the code, the stack trace says it has something to do with the BDD dependency (which is weird, since it shouldn't have anything to do with it).
The CI also appears to give out an error when running the tests I added. Strangely, the CI log gives no clue whatsoever on what caused them, and running it on my local machine outputs no error on these particular test cases. Running the test files by themselves also gives no errors. I'm surely missing something here, but have no idea what exactly.
We tried adding a minimal number of dependencies, but ultimately we ended up having to import the following externals:
To avoid namespace shenanigans, I imported the BDD dependency in its own namespace, since there are quite a few name conflicts with
LogicCircuits
.As to notation/naming conventions, I somewhat tried mimicking your namings, but not sure if I'm 100% compliant.
The
src/structurelearner/sample_psdd.jl
file implementsSamplePSDD
and any functions it needs.src/ensembles/ensembles.jl
adds a typeEnsemble
for mixtures of not-necessarily-same-vtree probabilistic circuits. It adds the weight learning methods we mention in the paper (namely EM, likelihood weighting, uniform weights and stacking of mixtures). Filesrc/ensembles/bmc.jl
adds aBayesModelComb
type for Bayesian Model Combination of PCs.Let me know if you want me to change anything in the code.
Thanks