This repository implements time series diffusion in the frequency domain. For more details, please read our paper: Time Series Diffusion in the Frequency Domain.
From repository:
- Clone the repository.
- Create and activate a new environment with conda (with
Python 3.10
or newer).
conda env create -n fdiff python=3.10
conda activate fdiff
- Install the requirement.
pip install freqdiff
- If you intend to train models, make sure that wandb is correctly configured on your machine by following this guide.
- Some of the datasets are automatically downloaded by our scripts via kaggle API. Make sure to create a kaggle token as explained here.
When the packages are installed, you are ready to train diffusion models!
In order to train models, you can simply run the following command:
python cmd/train.py
By default, this command will train a score model in the time domain with the ecg
dataset. In order to modify this behaviour, you can use hydra override syntax. The following hyperparameters can be modified to retrain all the models appearing in the paper:
Hyperparameter | Description | Values |
---|---|---|
fourier_transform | Whether or not to train a diffusion model in the frequency domain. | true, false |
datamodule | Name of the dataset to use. | ecg, mimiciii, nasa, nasdaq, usdroughts |
datamodule.subdataset | For the NASA dataset only. Selects between the charge and discharge subsets. | charge, discharge |
datamodule.smoother_width | For the ECG dataset only. Width of the Gaussian kernel smoother applied in the frequency domain. | |
score_model | The backbone to use for the score model. | default, lstm |
At the end of training, your model is stored in the lightning_logs
directory, in a folder named after the current run_id
. You can find the run_id
in the logs of the training and in the wandb dashboard if you have correctly configured wandb.
In order to sample from a trained model, you can simply run the following command:
python cmd/sample.py model_id=XYZ
where XYZ
is the run_id
of the model you want to sample from. At the end of sampling, the samples are stored in the lightning_logs
directory, in a folder named after the current run_id
.
One can then reproduce the plots in the paper by including the run_id
to the run_list
list appearing in this notebook and running all cells.
If you wish to contribute, please make sure that your code is compliant with our tests and coding conventions. To do so, you should install the required testing packages with:
pip install freqdiff[test]
Then, you can run the tests with:
pytest
Before any commit, please make sure that your staged code is compliant with our coding conventions by running:
pre-commit
If you use this code, please acknowledge our work by citing
@misc{crabbé2024time,
title={Time Series Diffusion in the Frequency Domain},
author={Jonathan Crabbé and Nicolas Huynh and Jan Stanczuk and Mihaela van der Schaar},
year={2024},
eprint={2402.05933},
archivePrefix={arXiv},
primaryClass={cs.LG}
}