Skip to content

Commit

Permalink
Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Arthur Mensch committed Jan 17, 2017
1 parent 4d50df2 commit 0258cb6
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 44 deletions.
82 changes: 39 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,27 @@
[![Travis](https://travis-ci.org/arthurmensch/modl.svg?branch=master)](https://travis-ci.org/arthurmensch/modl)
[![Coveralls](https://coveralls.io/repos/github/arthurmensch/modl/badge.svg?branch=master)](https://coveralls.io/github/arthurmensch/modl?branch=master)

This python package ([webpage](https://github.com/arthurmensch/modl)) implements our ICML'16 paper:
This python package ([webpage](https://github.com/arthurmensch/modl)) implements our two papers from 2016:

>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
Dictionary Learning for Massive Matrix Factorization. International Conference
[Stochastic Subsampling for Factorizing Huge Matrices](https://hal.archives-ouvertes.fr/hal-01431618v1). <hal-01431618> 2017.
>Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux.
[Dictionary Learning for Massive Matrix Factorization](https://hal.archives-ouvertes.fr/hal-01308934v2). International Conference
on Machine Learning, Jun 2016, New York, United States. 2016

It allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random sampling with online learning.
It is able to factorize matrices of terabyte scale with hundreds of components in the latent space in a few hours.

Reference paper is available on [HAL](https://hal.archives-ouvertes.fr/hal-01308934) / [arxiv](http://arxiv.org/abs/1605.00937). This package allows to reproduce the
This package allows to reproduce the
experiments and figures from the papers.

More importantly, it provides [https://github.com/scikit-learn/scikit-learn](scikit-learn) compatible
estimators that fully implements the proposed algorithms.

## Installing from source with pip

Installation from source is simple In a command prompt:
Installation from source is simple. In a command prompt:

```
git clone https://github.com/arthurmensch/modl.git
Expand All @@ -30,66 +34,58 @@ cd $HOME
py.test --pyargs modl
```

## Examples
## Core code

Two simple examples runs out-of-the box. Those are a good basis for understanding the API of `modl` estimators.
- ADHD (rfMRI) sparse decomposition, relying on [nilearn](https://github.com/nilearn/nilearn)
```
python examples/adhd_decompose.py
```
- Movielens (User/Movie ratings) prediction
```
python examples/recsys_predict.py
```

For Movielens example, you will need to download the dataset, from [spira repository](https://github.com/mblondel/spira).
```
make download-movielens10m
```
The package essentially provides three estimators:

## Experiments
- `DictFact`, that computes a matrix factorization from Numpy arrays
- `fMRIDictFact`, that computes sparse spatial maps from fMRI images
- `ImageDictFact`, that computes a patch dictionary from an image
- `RecsysDictFact`, that allows to predict score from a collaborative filtering approach

### Recommender systems

Recommender systems experiments can be reproduced running the following command in the root repository.
## Examples

### fMRI decomposition

A fast running example that decomposes a small dataset of resting-fmri data into a 70 components map is provided

```
python examples/experimental/recsys/recsys_compare.py
python examples/recsys_compare.py
```

You will need to download datasets beforehand:
It can be adapted for running on the 2TB HCP dataset, by changing the source parameter into 'hcp' (you will need to download the data first)

### Hyperspectral images

A fast running example that extracts the patches of a HD image can be run from

```
make download-movielens1m
make download-movielens10m
make download-netflix
python examples/decompose_image.py
```

### HCP decomposition
It can be adapted to run on AVIRIS data, changing the image source into 'aviris' in the file.

### Recommender systems

You will need to retrieve the S500 release of the [HCP dataset](http://www.humanconnectome.org/data/) in some way
beforehand. You may use the public S3 bucket, order filled hard-drives, or download it directly.
Our core algorithm can be run to perform collaborative filtering very efficiently:

Edit `$HCPLOCATION` in the `Makefile` and run
```
make hcp
python examples/recsys_compare.py
```
to create symlinks and download a useful mask.

The HCP experiment can be reproduced as such:
You will need to download datasets beforehand:

```
# unmask data
python examples/experiment/fmri/hcp_prepare.py
# compare methods
python examples/experiment/fmri/hcp_compare.py
# analyse convergence
python examples/experiment/fmri/hcp_analysis.py
# plot results
python examples/experiment/fmri/hcp_plot.py
make download-movielens1m
make download-movielens10m
```

By default, results will be available in `$HOME/output/modl`
## Future work

- `sacred` dependency will be removed
- Release a fetcher for HCP from S3 bucker
- Release examples with larger datasets and benchmarks

## Contributions

Expand Down
3 changes: 2 additions & 1 deletion modl/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .dict_fact import DictFact
from .image import ImageDictFact
from .fmri import fMRIDictFact
from .fmri import fMRIDictFact
from .recsys import RecsysDictFact

0 comments on commit 0258cb6

Please sign in to comment.