MuAt

Mutation-Attention Model

Quick Start (Tested on Linux)

Clone muat repository
Go to muat repository
Create conda environment

conda env create -f muat-conda.yml

activate muat environment

conda activate muat

Download genome reference to ./ref:

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz -O ./ref/ref.gz
gunzip ./ref/ref.gz

Make sure that preprocessing/dmm/annotate_mutations_with_bed.sh is permitted to be executed.

chmod 755 preprocessing/dmm/annotate_mutations_with_bed.sh

Predicting .vcf file with MuAt pretrained model :

python3 main.py --dataloader 'pcawg' --input-file '/path/to/muat/data/raw/vcf/file.vcf' --reference '/path/to/muat/ref/ref' --load-ckpt-file '/path/to/muat/bestckpt/wgs/motif+position_features.pthx' --output-pred-dir '/path/to/muat/data/raw/output/' --single-pred-vcf --get-features

Predicting .vcf file with MuAt ensemble model :

python3 main.py --dataloader 'pcawg' --input-file '/path/to/muat/data/raw/vcf/file.vcf' --reference '/path/to/muat/ref/ref' --load-ckpt-dir '/path/to/muat/bestckpt/wgs/ensamble/' --output-pred-dir '/path/to/muat/data/raw/output/' --ensemble --get-features

Predicting .vcf file (GRCh38) : *add --convert-hg38-hg19

python3 main.py --dataloader 'pcawg' --input-file '/path/to/muat/data/raw/vcf/file.vcf' --reference '/path/to/muat/ref/ref' --load-ckpt-dir '/path/to/muat/bestckpt/wgs/ensamble/' --output-pred-dir '/path/to/muat/data/raw/output/' --ensemble --get-features --convert-hg38-hg19

Training PCAWG

Download dataset:

run scripts/download_icgc.sh

Downloading data approximately may take five hours to complete.

Preprocessing

Create conda environment

conda env create -f muat-pre-conda.yml

Activate muat-pre env

conda activate muat-pre

Copy scripts/dmmpre.sh, scripts/preprocessdmm.sh, scripts/annotate_preprocessed.sh to <data_dir>/release_28/
Edit dmmpre.sh assigning to dependencies (check the dmmpre.sh for information)
run preprocessdmm.sh > if successful, this will give an output of 'somatic_mutations.<icgc_project>.dmm.k256.tsv.gz' in every icgc project folder
annotate_preprocessed.sh > ex: ./annotate_preprocessed.sh ./BLCA-US/somatic_mutations.BLCA-US.dmm.k256.tsv.gz BLCA-US. If successful, this file will give an output of somatic_mutations.<icgc_project>.dmm.k256.annotated.tsv.gz in every icgc project folder

Create Dataset for MuAt

run preprocessing/create_PCAWG_dataset.py (adjust the dependencies accordingly)

Training from scratch

python3 main.py --dataloader 'pcawg' --block-size 5000 --n-class 24 --n-layer 1 --n-head 1 --n-emb 256 --motif --mut-type 'SNV+MNV' --fold 1 --input-data-dir '/path/to/input/data/dir/' --save-ckpt-dir '/path/to/saveckptdir/' --train

Predicting from scratch

python3 main.py --dataloader 'pcawg' --block-size 5000 --context-length 3 --n-class 24 --n-layer 1 --n-head 1 --n-emb 256 --motif --mut-type 'SNV+MNV' --fold 1 --input-data-dir '/path/to/input/data/dir/' --load-ckpt-dir '/path/to/saveckptdir/' --predict

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
bestckpt		bestckpt
data/raw		data/raw
dataset		dataset
dataset_utils		dataset_utils
dmm2		dmm2
exec_utils		exec_utils
extcode		extcode
extfile		extfile
models		models
oldformat		oldformat
preprocessing		preprocessing
ref		ref
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README raw.md		README raw.md
README.md		README.md
README_puhti.md		README_puhti.md
dmm_errors.txt		dmm_errors.txt
main.py		main.py
main_old.py		main_old.py
muat-conda.yml		muat-conda.yml
muat-pre-conda.yml		muat-pre-conda.yml
preproc-requirements.txt		preproc-requirements.txt
run_examples.py		run_examples.py
sample_dataset.py		sample_dataset.py
step.txt		step.txt
testpy2.py		testpy2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuAt

Quick Start (Tested on Linux)

Training PCAWG

Download dataset:

Preprocessing

Create Dataset for MuAt

Training from scratch

Predicting from scratch

About

Releases

Packages

Languages

License

MLBioMed/muat

Folders and files

Latest commit

History

Repository files navigation

MuAt

Quick Start (Tested on Linux)

Training PCAWG

Download dataset:

Preprocessing

Create Dataset for MuAt

Training from scratch

Predicting from scratch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages