Bash Script for Annotation and TSV generation

Usage:

Install Prerequisites:
- Ensure you have Java installed.
- Download and set up snpEff and SnpSift.
- Obtain the necessary databases: dbSNP, ClinVar, gnomAD, dbNSFPand GRCh37.p13

Run the Script:

Execute the script using the following commands:

# Make the script executable
chmod +x annotate_script.sh

# Run the script
./annotate_script.sh

Input:
- Provide the path to your input VCF file when prompted.
Output:
- Annotated VCF files are generated at each annotation step.
- The final annotated TSV file is named ${output_vcf}.extractedTSV.tsv.

The Bash Script - annotate_script.sh

#!/bin/bash
# Paths to tools and databases
# (Update these paths according to your setup)
snpEff="/home/hansi98/Databases/other_db/snpEff/snpEff.jar"
snpSift="/home/hansi98/Databases/other_db/snpEff/SnpSift.jar"
java="/usr/bin/java"
dbsnp="/mnt/d/Hansi/All_20180423.vcf.gz"
clinvar="/home/hansi98/Databases/other_db/clinvar.vcf"
gnomad="/mnt/d/Hansi/release_2.1.1_vcf_exomes_gnomad.exomes.r2.1.1.sites.vcf.bgz"
dbnsfp="/mnt/d/Hansi/dbNSFP4.1a.txt.gz"

# Prompt for the input VCF file path
read -p "Enter the path to your input VCF file: " input_vcf
# Verify that the input VCF file exists
if [ ! -f "$input_vcf" ]; then
    echo "Error: Input VCF file not found: $input_vcf"
    exit 1
fi

# Prompt for the desired output VCF file name
read -p "Save the annotated VCF as (.vcf): " output_vcf

# Annotation using SNPSift and dbSNP
$java -Xmx4g -jar "$snpSift" annotate -v "$dbsnp" "$input_vcf" > "${output_vcf}.snpSift.vcf"

# Annotation using SNPeff
$java -Xmx4g -jar "$snpEff" -canon -v GRCh37.p13 -noStats "${output_vcf}.snpSift.vcf" > "${output_vcf}.snpEff.vcf" 
# Annotation using SNPSift and ClinVar
$java -Xmx4g -jar "$snpSift" annotate -v "$clinvar" "${output_vcf}.snpEff.vcf" > "${output_vcf}.clinvar.vcf"

# Annotation using SNPSift and GNOMAD
$java -Xmx4g -jar "$snpSift" annotate -v "$gnomad" "${output_vcf}.clinvar.vcf" > "${output_vcf}.gnomad.vcf"

# Annotation using SNPSift and dbNSFP
$java -Xmx4g -jar "$snpSift" dbnsfp -v -db "$dbnsfp" -f rs_dbSNP151,HGVSc_snpEff,HGVSp_snpEff,VEP_canonical,Denisova,FATHMM_pred,fathmm-MKL_coding_score,Eigen-raw_coding,GERP++_RS,phyloP100way_vertebrate,phyloP30way_mammalian,1000Gp3_AF,1000Gp3_SAS_AF,genename,ExAC_AF,phastCons100way_vertebrate,Polyphen2_HDIV_pred,MutationTaster_pred,SIFT_pred,PROVEAN_pred,CADD_raw_hg19,CADD_phred_hg19,gnomAD_exomes_AF,gnomAD_exomes_SAS_AF,ExAC_AF,ExAC_SAS_AF,gnomAD_exomes_SAS_nhomalt,gnomAD_genomes_AF,gnomAD_genomes_nhomalt,gnomAD_genomes_SAS_AF,gnomAD_genomes_SAS_nhomalt,clinvar_id,clinvar_clnsig,clinvar_trait,clinvar_review "${output_vcf}.gnomad.vcf" > "${output_vcf}.dbnsfp.vcf"

# Cleanup intermediate files (comment out if you want to keep them)
rm "${output_vcf}.snpSift.vcf" "${output_vcf}.snpEff.vcf" "${output_vcf}.clinvar.vcf" "${output_vcf}.gnomad.vcf" 

# Extract relevant fields and form TSV
$java -Xmx4g -jar "$snpSift" extractFields -v -s "," -e "N/A" "${output_vcf}.dbnsfp.vcf" FILTER QUAL CHROM POS ID AF REF ALT \
ANN[0].ANNOTATION ANN[0].IMPACT ANN[0].GENE ANN[0].FEATUREID ANN[0].BIOTYPE ANN[0].RANK ANN[0].HGVS_C ANN[0].HGVS_P \
non_cancer_AF non_neuro_AF controls_AF non_topmed_AF AF_sas AF_amr AF_nfe AF_eas AF_afr AF_nfe_onf AF_eas_oea AF_nfe_nwe AF_nfe_seu AF_nfe_swe AF_eas_jpn AF_eas_kor AF_fin AF_asj AF_nfe_est AF_oth \
non_neuro_nhomalt_popmax controls_nhomalt_popmax non_topmed_nhomalt_popmax nhomalt_popmax non_cancer_nhomalt_popmax \
rs_dbSNP151 HGVSc_snpEff HGVSp_snpEff VEP_canonical Denisova FATHMM_pred fathmm-MKL_coding_score Eigen-raw_coding GERP++_RS phyloP100way_vertebrate phyloP30way_mammalian 1000Gp3_AF 1000Gp3_SAS_AF genename ExAC_AF phastCons100way_vertebrate Polyphen2_HDIV_pred MutationTaster_pred SIFT_pred PROVEAN_pred CADD_raw_hg19 CADD_phred_hg19 gnomAD_exomes_AF gnomAD_exomes_SAS_AF ExAC_AF ExAC_SAS_AF gnomAD_exomes_SAS_nhomalt gnomAD_genomes_AF gnomAD_genomes_nhomalt gnomAD_genomes_SAS_AF gnomAD_genomes_SAS_nhomalt clinvar_id clinvar_clnsig clinvar_trait clinvar_review > "${output_vcf}.extractedTSV.tsv"

echo "Annotation completed. Output written to: ${output_vcf}.dbnsfp.vcf and ${output_vcf}.extractedTSV.tsv"

Explanation of Annotation Steps:

Annotation using SNPSift and dbSNP:
- Annotates variants with dbSNP information.
Annotation using SNPeff:
- Annotates variants with SNP effect predictions.
Annotation using SNPSift and ClinVar:
- Annotates variants with ClinVar information.
Annotation using SNPSift and gnomAD:
- Annotates variants with gnomAD allele frequency information.
Annotation using SNPSift and dbNSFP:
- Annotates variants with various functional predictions from dbNSFP.

Cleanup:

Intermediate files are removed by default. If you want to keep them, comment out the cleanup section at the end of the script.

Note:

Update the paths in the script to match your local setup.

For additional resources, refer to the README.

For questions or support, contact us at hansithewarapperuma@gmail.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation.md

documentation.md

Bash Script for Annotation and TSV generation

Files

documentation.md

Latest commit

History

documentation.md

File metadata and controls

Bash Script for Annotation and TSV generation