Skip to content

rare disease

Brent Pedersen edited this page Apr 19, 2021 · 52 revisions

using slivar for rare disease research

Full analysis for trios with unaffected parents

The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named slivar-functions.js those are easily modified to allow more/less stringency or different logic.

## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
# ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
# or hg38: ftp://ftp.ensembl.org/pub/release-96/gff3/homo_sapiens/
gff=/Homo_sapiens.GRCh38.96.gff3.gz 
cohort=mycohort
## end variables

bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta $vcf -O u \
| slivar expr --vcf - \
    --ped $ped \
    -o vcfs/$cohort.vcf \
    --pass-only \
    -g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
    --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' \
    --js ../slivar/js/slivar-functions.js \
    --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' \
    --family-expr 'recessive:fam.every(segregating_recessive)' \
    --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' \
    --family-expr 'x_recessive:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_recessive_x)' \
    --trio 'comphet_side:comphet_side(kid, mom, dad) && INFO.gnomad_nhomalt < 10' \

# compound-hets then groups by gene (it requires and automatically uses annotation from VEP/SnpEff/Bcftools to infer gene):
slivar compound-hets -v vcfs/$cohort.vcf \
    --sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf

Note that for multi-sample VCFS, -s arguments to compound-hets make sure that only the trio(s) that passed the filters in the upstream slivar command are used to compose the 2 ends of a compound heterozgyote.

For speed, this can be automatically parallelized using pslivar.

For hg38, it's wise to add -g topmed.hg38.dbsnp.151.zip to the first command and then add INFO.topmed_af < 0.05 to the --info expression ( -g can be added multiple times). This removes common variants that are found in hg38, but not present in gnomAD (which is lifted from GRCh37). Get the zip file for topmed along with the other zip files here

Note that this uses impactful which will remove, for example synonymous, intronic, intergenic. If the VCF has not been previously annotated with VEP, bcftools, or snpEff, this will not be available.

The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail

slivar-functions.js also contains a function, segregating_dominant for cases with a dominant mode of inheritance. Users can add this as needed, but remember that dominant will yield many more candidate variants.