PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing

UPDATE info

to v.1.2.0 (2024/11/13)

PDIVAS subcommand vcf2tsv became able to handle & output sample columns in VCF files.
SpliceAI annotation file (grch38.txt) was updated to GENCODE V47.
Debug PDIVAS exceptional output (about 'wo_annots' and 'out_of_scope').

Sumary

PDIVAS is a pathogenicity predictor for deep-intronic variants causing aberrant splicing.
The deep-intronic variants can cause pathogenic pseudoexons or extending exons which disturb the normal gene expression and can be the cause of patients with Mendelian diseases.
PDIVAS efficiently prioritizes the causal candidates from a vast number of deep-intronic variants detected by whole-genome sequencing.
The scope of PDIVAS prediction is variants in protein-coding genes on autosomes and X chromosome.
This command-line interface is compatible with variant files in VCF format.

PDIVAS is modeled on random forest algorism to classify pathogenic and benign variants with referring to features from

Splicing predictors of SpliceAI (Jaganathan et al., Cell 2019) and MaxEntScan (Yeo and Berge, j. Comput. Biol. 2004)
(*)The output module of SpliceAI was customed for PDIVAS features (see the Option2, for the details).
Human splicing constraint score of ConSplice (Cormier et al., BMC Bioinfomatics 2022).

Reference & contact

Kurosawa et al. BMC Genomics 2023
[email protected] (Ryo Kurosawa at Kyoto University)

<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels　(1~4nt))

For the quick implementation of PDIVAS, please use the score-precomputed file here. Possible rare SNVs and short indels (1~4nt) in genes (n=4,512) of Mendelian diseases were comprehensively annotated in the file. To annotate your VCF file, please run the command below,for example.

0. Installation

conda install -c bioconda vcfanno
git clone https://github.com/brentp/vcfanno.git

1. Setting score-precomputed files

(Download score-precomputed file above and create a configure file (following https://github.com/brentp/vcfanno))

vi ./conf.toml

Write as below

[[annotation]]
file="./PDIVAS_precomputed/GRCh38/PDIVAS_precomputed_short_GRCh38.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields = ["PDIVAS"]
ops=["self"]
names=["PDIVAS"]

2. Perform PDIVAS annotation

# Move to your working directory. (The case below is the directory in this repository.)
cd examples

# Perform annotation
vcfanno -lua ./vcfanno/example/custom.lua ./conf.toml ./ex.vcf > output_precomp.vcf
#Compare the output_precomp.vcf with output_precomp_expect.vcf.gz to validate the successful annotation.

<Option2>
Perform annotation of individual features and calculation of PDIVAS scores

For more comprehensive annotation than pre-computed files, run PDIVAS by following the description below.

0-1. Installation

#It is better to prepare new conda environments for PDIVAS installation.
#They take a little long time to solve the environment.
conda create -n PDIVAS -c bioconda -c conda-forge spliceai tensorflow==2.6.2 pdivas bcftools vcfanno
conda create -n VEP -c conda-forge -c bioconda perl==5.26.2 ensembl-vep==105

The successful installation was verified on anaconda version 23.3.1

0-2. Setting customed usages

-For output-customized SpliceAI for PDIVAS conda environment
https://github.com/shiro-kur/SpliceAI

git clone https://github.com/shiro-kur/SpliceAI.git
cp -r SpliceAI/spliceai/* ~/miniconda3/envs/PDIVAS/lib/python3.9/site-packages/spliceai/

-For VEP custom usage

# Download VEP cache files
$ mkdir -p ~/Ref/.vep
$ cd ~/Ref/.vep
$ wget https://ftp.ensembl.org/pub/release-113/variation/vep/homo_sapiens_vep_113_GRCh38.tar.gz
$ tar xzf homo_sapiens_vep_113_GRCh38.tar.gz

#Setting MaxEntScan
$ mkdir -p ~/Ref/.vep/Plugin/MaxEntScan
$ cd ~/Ref/.vep/Plugin/MaxEntScan
$ wget http://hollywood.mit.edu/burgelab/maxent/download/fordownload.tar.gz
$ tar xzf fordownload.tar.gz

#Setting ConSplice
$ cd ~/Ref/.vep
$ wget https://storage.cloud.google.com/pdivas/ConSplice_for_PDIVAS/ConSplice.50bp_region.inverse_proportion_refo_hg38.bed.gz
$ tabix -f ConSplice.50bp_region.inverse_proportion_refo_hg38.bed.gz

The ConSplice file was edited from the originally scored file by (Cormier et al., BMC Bioinformatics 2022).

1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)

conda activate PDIVAS
bcftools norm -m - multi.vcf > bi.vcf

2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.

conda activate VEP
vep \
--cache --offline --cache_version 107 --assembly GRCh38 --hgvs --pick_allele_gene \
--fasta ./references/hg38.fa.gz --vcf --force \
--custom ./references/ConSplice.50bp_region.inverse_proportion_refo_hg38.bed.gz,ConSplice,bed,overlap,0 \
--plugin MaxEntScan,./references/MaxEntScan/fordownload,SWA,NCSS \
--fields "Consequence,SYMBOL,Gene,INTRON,HGVSc,STRAND,ConSplice,MES-SWA_acceptor_diff,MES-SWA_acceptor_alt,MES-SWA_donor_diff,MES-SWA_donor_alt" \
--compress_output bgzip \
-i ./examples/ex.vcf.gz -o ./examples/ex_vep.vcf.gz

3. Add output-customized SpliceAI scores

conda activate PDIVAS
spliceai -I examples/ex_vep.vcf.gz -O examples/ex_vep_AI.vcf -R hg38.fa -A grch38 -D 300 -M 1

4. Perform the detection of deep-intronic variants and PDIVAS prediction

pdivas predict -I examples/ex_vep_AI.vcf -O examples/ex_vep_AI_PD.vcf.gz -F off

5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)

pdivas vcf2tsv -I examples/ex_vep_AI_PD.vcf.gz -O examples/ex_vep_AI_PD.tsv

Usage of PDIVAS command line

1. $ pdivas predict

Required parameters:

-I: Input VCF(.vcf/.vcf.gz) with variants of interest.
-O: Output VCF(.vcf/.vcf.gz) with PDIVAS predictions GENE_ID|PDIVAS_score Variants in multiple genes have separate predictions for each gene.

Optional parameters:

-F: filtering function (off/on) : Output all variants (-F off; default) or only deep-intronic variants with PDIVAS scores (-F on)")

Details of PDIVAS INFO field:

ID

Description

GENE_ID

Ensembl gene ID based on GENCODE V41(GRCh38) or V19(GRCh37)

PDIVAS

<Predicted result>
Pattern 1 : 0.000-1.000 float value (The higher, the more deleterious)
<Exceptions>
- Output with '-F off'. Filtered with '-F on'.
Pattern 2 : 'wo_annots', variants out of VEP or SpliceAI annotations :
Pattern 3 : 'out_of_scope', variants without PDIVAS annotation scope
(chrY, non-coding gene or non-deep-intronic variants)　
Pattern 4 :'no_gene_match', variants without matched gene annotation between VEP and SpliceAI

2. $ pdivas vcf2tsv

Required parameters:

-I: *Input VCF(.vcf/.vcf.gz) with VEP, SpliceAI,and PDIVAS annotations.
-O: The path to output tsv file name and pass.
*Input VCF is valid only when it was generated through this pipeline.

Interpretation of PDIVAS scores

More details in Kurosawa et al. medRxiv 2023 .

Threshold	Sensitivity (*1)	candidates/individual (*2)
>=0.082	95%	26.8
>=0.151	90%	14.5
>=0.340	85%	6.7
>=0.501	80%	4.1
>=0.575	75%	3.0
>=0.763	70%	1.2

(*1) Sensitivities were calculated on curated pathogenic deep-intronic variants in a test dataset.
(*2) Candidates of pathogenic deep-intronic variants were obtained through the process described below. (WGS: Whole-genome sequencing)

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
Customed_SpliceAI		Customed_SpliceAI
PDIVAS_pictures_Github		PDIVAS_pictures_Github
examples		examples
pdivas		pdivas
references		references
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing

UPDATE info

Sumary

Reference & contact

<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels　(1~4nt))

0. Installation

1. Setting score-precomputed files

2. Perform PDIVAS annotation

<Option2>
Perform annotation of individual features and calculation of PDIVAS scores

0-1. Installation

0-2. Setting customed usages

1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)

2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.

3. Add output-customized SpliceAI scores

4. Perform the detection of deep-intronic variants and PDIVAS prediction

5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)

Usage of PDIVAS command line

1. $ pdivas predict

2. $ pdivas vcf2tsv

Interpretation of PDIVAS scores

About

Releases

Packages

Languages

License

shiro-kur/PDIVAS

Folders and files

Latest commit

History

Repository files navigation

PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing

UPDATE info

Sumary

Reference & contact

<Option1>Prediction with the PDIVAS-precomputed files (SNV+ short indels (1~4nt))

0. Installation

1. Setting score-precomputed files

2. Perform PDIVAS annotation

<Option2>Perform annotation of individual features and calculation of PDIVAS scores

0-1. Installation

0-2. Setting customed usages

1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)

2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.

3. Add output-customized SpliceAI scores

4. Perform the detection of deep-intronic variants and PDIVAS prediction

5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)

Usage of PDIVAS command line

1. $ pdivas predict

2. $ pdivas vcf2tsv

Interpretation of PDIVAS scores

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels　(1~4nt))

<Option2>
Perform annotation of individual features and calculation of PDIVAS scores

Packages