Code for phenotype prediction from cross-modal autoencoder embeddings used in the following paper.
All dependencies are provided in the environment.yml file, which can be used to create a conda environment (see conda instructions). Installation takes around 5 minutes.
main.py requires the following inputs (described in options_parser.py):
- Model embeddings
- Phenotypes
- Indices for train, validation, and test samples
- Choice of kernel (ntk, linear, laplace),
- Number of epochs for kernel regression
main.py will write the
- Model embeddings should be provided in a tsv format. Our code uses 'sample_id' as a key to identify each MRI/ECG sample and each row has a tab separated list of real values representing coordinates of latent dimensions.
- Phenotypes should be provided in a tsv format. Our code uses 'sample_id' to link phenotypes to corresponding embeddings and each row has a tab separated list of real values for phenotypes (e.g. LVM, LVEDV, etc.).
- Train, validation, and test samples should be sets (not lists) of sample_ids used to identify data for training, validation, and test.