diff --git a/README.md b/README.md index 4bd05ba..3774dad 100644 --- a/README.md +++ b/README.md @@ -3,15 +3,19 @@ [![Travis](https://travis-ci.org/arthurmensch/modl.svg?branch=master)](https://travis-ci.org/arthurmensch/modl) [![Coveralls](https://coveralls.io/repos/github/arthurmensch/modl/badge.svg?branch=master)](https://coveralls.io/github/arthurmensch/modl?branch=master) -This python package ([webpage](https://github.com/arthurmensch/modl)) implements our ICML'16 paper: +This python package ([webpage](https://github.com/arthurmensch/modl)) implements our two papers from 2016: >Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux. -Dictionary Learning for Massive Matrix Factorization. International Conference +[Stochastic Subsampling for Factorizing Huge Matrices](https://hal.archives-ouvertes.fr/hal-01431618v1). 2017. + +>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux. +[Dictionary Learning for Massive Matrix Factorization](https://hal.archives-ouvertes.fr/hal-01308934v2). International Conference on Machine Learning, Jun 2016, New York, United States. 2016 It allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random sampling with online learning. +It is able to factorize matrices of terabyte scale with hundreds of components in the latent space in a few hours. -Reference paper is available on [HAL](https://hal.archives-ouvertes.fr/hal-01308934) / [arxiv](http://arxiv.org/abs/1605.00937). This package allows to reproduce the +This package allows to reproduce the experiments and figures from the papers. More importantly, it provides [https://github.com/scikit-learn/scikit-learn](scikit-learn) compatible @@ -19,7 +23,7 @@ More importantly, it provides [https://github.com/scikit-learn/scikit-learn](sci ## Installing from source with pip -Installation from source is simple In a command prompt: +Installation from source is simple. In a command prompt: ``` git clone https://github.com/arthurmensch/modl.git @@ -30,66 +34,58 @@ cd $HOME py.test --pyargs modl ``` -## Examples +## Core code -Two simple examples runs out-of-the box. Those are a good basis for understanding the API of `modl` estimators. - - ADHD (rfMRI) sparse decomposition, relying on [nilearn](https://github.com/nilearn/nilearn) - ``` - python examples/adhd_decompose.py - ``` - - Movielens (User/Movie ratings) prediction - ``` - python examples/recsys_predict.py - ``` - -For Movielens example, you will need to download the dataset, from [spira repository](https://github.com/mblondel/spira). -``` -make download-movielens10m -``` +The package essentially provides three estimators: -## Experiments +- `DictFact`, that computes a matrix factorization from Numpy arrays +- `fMRIDictFact`, that computes sparse spatial maps from fMRI images +- `ImageDictFact`, that computes a patch dictionary from an image +- `RecsysDictFact`, that allows to predict score from a collaborative filtering approach -### Recommender systems -Recommender systems experiments can be reproduced running the following command in the root repository. +## Examples + +### fMRI decomposition + +A fast running example that decomposes a small dataset of resting-fmri data into a 70 components map is provided ``` -python examples/experimental/recsys/recsys_compare.py +python examples/recsys_compare.py ``` -You will need to download datasets beforehand: +It can be adapted for running on the 2TB HCP dataset, by changing the source parameter into 'hcp' (you will need to download the data first) + +### Hyperspectral images + +A fast running example that extracts the patches of a HD image can be run from ``` -make download-movielens1m -make download-movielens10m -make download-netflix +python examples/decompose_image.py ``` -### HCP decomposition +It can be adapted to run on AVIRIS data, changing the image source into 'aviris' in the file. + +### Recommender systems -You will need to retrieve the S500 release of the [HCP dataset](http://www.humanconnectome.org/data/) in some way - beforehand. You may use the public S3 bucket, order filled hard-drives, or download it directly. +Our core algorithm can be run to perform collaborative filtering very efficiently: -Edit `$HCPLOCATION` in the `Makefile` and run ``` -make hcp +python examples/recsys_compare.py ``` -to create symlinks and download a useful mask. -The HCP experiment can be reproduced as such: +You will need to download datasets beforehand: + ``` -# unmask data -python examples/experiment/fmri/hcp_prepare.py -# compare methods -python examples/experiment/fmri/hcp_compare.py -# analyse convergence -python examples/experiment/fmri/hcp_analysis.py -# plot results -python examples/experiment/fmri/hcp_plot.py +make download-movielens1m +make download-movielens10m ``` -By default, results will be available in `$HOME/output/modl` +## Future work +- `sacred` dependency will be removed +- Release a fetcher for HCP from S3 bucker +- Release examples with larger datasets and benchmarks ## Contributions diff --git a/modl/__init__.py b/modl/__init__.py index ebf3a37..929fa3e 100644 --- a/modl/__init__.py +++ b/modl/__init__.py @@ -1,3 +1,4 @@ from .dict_fact import DictFact from .image import ImageDictFact -from .fmri import fMRIDictFact \ No newline at end of file +from .fmri import fMRIDictFact +from .recsys import RecsysDictFact \ No newline at end of file