Note pomegranate is currently being rewritten from the ground up using PyTorch, which is being released as torchegranate until it is completed. Check it out! https://github.com/jmschrei/torchegranate
Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work!
pomegranate is a package for building probabilistic models in Python that is implemented in Cython for speed. A primary focus of pomegranate is to merge the easy-to-use API of scikit-learn with the modularity of probabilistic modeling to allow users to specify complicated models without needing to worry about implementation details. The models implemented here are built from the ground up with big data processing in mind and so natively support features like multi-threaded parallelism and out-of-core processing. Click on the binder badge above to interactively play with the tutorials!
pomegranate is pip-installable using pip install pomegranate
and conda-installable using conda install pomegranate
. If neither work, more detailed installation instructions can be found here.
If you get an error involving pomegranate/base.c
, try installing with pip install --no-cache-dir pomegranate
.
If you get an error involving pomegranate/distributions/NeuralNetworkWrapper.c: No such file or directory
, try installing Cython first and then re-installing.
A few packages are optional to use pomegranate but necessary for some specific functionality. For example, pandas is needed to run the tests involving I/O, matplotlib and pygraphviz are needed for plotting capabilities, and cupy is needed for GPU acceleration.
- Probability Distributions
- General Mixture Models
- Hidden Markov Models
- Naive Bayes and Bayes Classifiers
- Markov Chains
- Discrete Bayesian Networks
- Discrete Markov Networks
The discrete Bayesian networks also support novel work on structure learning in the presence of constraints through a constraint graph. These constraints can dramatically speed up structure learning through the use of loose general prior knowledge, and can frequently make the exact learning task take only polynomial time instead of exponential time. See the PeerJ manuscript for the theory and the pomegranate tutorial for the practical usage!
To support the above algorithms, it has efficient implementations of the following:
- Kmeans/Kmeans++/Kmeans||
- Factor Graphs
- sklearn-like API
- Multi-threaded Training
- BLAS/GPU Acceleration
- Out-of-Core Learning
- Data Generators and IO
- Semi-supervised Learning
- Missing Value Support
- Customized Callbacks
Please take a look at the tutorials folder, which includes several tutorials on how to effectively use pomegranate!
See the website for extensive documentation, API references, and FAQs about each of the models and supported features.
No good project is done alone, and so I'd like to thank all the previous contributors to YAHMM, and all the current contributors to pomegranate, including the graduate students who share my office I annoy on a regular basis by bouncing ideas off of.
pomegranate requires:
- Cython (only if building from source)
- NumPy
- SciPy
- NetworkX
- joblib
To run the tests, you also must have nose
installed.
If you would like to contribute a feature then fork the master branch (fork the release if you are fixing a bug). Be sure to run the tests before changing any code. You'll need to have nosetests installed. The following command will run all the tests:
python setup.py test
Let us know what you want to do just in case we're already working on an implementation of something similar. This way we can avoid any needless duplication of effort. Also, please don't forget to add tests for any new functions.