This repository contains code for paper MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
Here, we provide links for downloading data
After downloading those experiments they should be placed in directory experiments/{dataset_name}/data/
Configurations are splitted into data configs and models configs. Data configs are for different paths to data, random seed, batch size, names of columns, and specifics for contrastive augmentations. Model configs are for changing model parameters, such as which model to train, hidden sizes, which encoder to use, normalizations, etc.
Data configurations for each dataset are placed in directory configs/data_configs/{dataset_name}.py
for Generative and MLEM models and in directory configs/data_configs/contrastive/{dataset_name}.py
for Contrastive and Naive models.
Data configurations for each dataset are placed in directory configs/model_configs/{required_model_name}/{dataset_name}.py
.
Due to origin of MLEM model technique here it is named Sigmoid, after the loss function. Naive method is called GC, that stands for Gen-Contrastive.
All experiments run with the sh scripts. To change the dataset for experiment you simply need to pass your dataset config to this sh script.
run_pipe_supervised.sh
script will run the supervised experiment. Inside the script one could change which configs to take, how to name experiment, number of epochs and if to use checkpoints.
run_pipe_contrastive.sh
script does contrastive learning.
run_pipe_gen.sh
does generative modeling
run_pipe_gen_contrastive.sh
does the procedure described as Naive in paper
run_pipe_sigmoid.sh
does MLEM modeling. However before running this script, path to contrastive checkpoint should be placed in MLEM config and pre-trained contrastive net configs should match with contrastive net configs inside MLEM model config.
For evaluating TPP task and robustness of embeddings one needs first to run tpp_dataset.py
with passing desired dataset as argument. Further this script will generate refactored data for evaluation.
Then one should run run_tpp.sh
script with argmunets that are suitable. This means one should provide path to configs of tested model, data config of dataset, name of current model and checkpoints to test.
In the current version of repo are appropriate configs. To obtain the same results one need to just pass right configs to the desired experiment setting.