Python scripts to make dosage files without ambiguous SNPs and INDELs directly from vcf.gz
. Both scripts essentially work the same, in which the outputs will be a sample .txt
file (which is required to run PrediXcan) and a dosage txt.gz
file per chromosome, including non-autosomes (if provided). The only difference is that TOPMed_vcf2dosage_a.py
takes a .vcf.gz
containg only a single chromosome as input, whereas TOPMed_vcf2dosage_b.py
takes as input a vcf.gz
containing multiple chromosomes.
The scripts can also be customized if needed. As provided, they will rename all SNPs to the chr#:pos:ref:alt
format and update chrX, chrXY, chrY, and chrM to their numeric versions.
Imported libraries:
- argparse
- gzip
- os
- sys
NOTE: this is a modified version of the script found in here.