This repository contains a Nextflow DSL2 pipeline for annotating genetic variants in VCF files using SnpEff and dbSnp database. The pipeline processes input VCF files, performs various annotations, and generates a comprehensive annotation file.
Make sure you have the following dependencies installed before running the pipeline:
-
FilterInputFiles: Filters input VCF files using PLINK 2 to retain PASS variants with a maximum of 2 alleles.
-
AnnotateWithRSID: Annotates variants with RSID using SnpSift and the dbSNP database.
-
AnnotateWithImpact: Annotates variants with functional impact using snpEff and a specified reference genome.
-
FullyAnnotateWithDbSNP: Performs comprehensive annotation using SnpSift and dbNSFP database, including information on gene impact, gnomAD data, REVEL scores, ClinVar information, and more.
-
ExtractFields: Extracts relevant fields from the annotated VCF files and creates a tab-separated text file with a header for downstream analysis.
-
Clone the repository:
git clone https://github.com/IARCbioinfo/snpeff_annotation-nf cd snpeff_annotation-nf
-
Adjust the
nextflow.config
file if necessary. The package versions are specified inenvironment.yml
file. -
Run the pipeline with:
nextflow run main.nf -profile conda
Name | Default value | Description |
---|---|---|
--input_folder_with_VCF_files |
${baseDir}/VCFs/ |
Folder containing *vcf.gz files |
Name | Default value | Description |
---|---|---|
--reference_genome |
GRCh37.75 |
Reference genome |
--dbNSF_path |
${baseDir}/dbNSFP4.1a.txt.gz |
dbNSFP database |
--dbSNP_path |
${baseDir}/dbsnp150.vcf.gz |
dbSNP database |
--output_path |
${baseDir}/output |
Output folder |
The final annotated and extracted information will be available in the output directory as full_annotation.txt
.
- Adjust the memory requirements etc in the
nextflow.config
file. - Customize the annotation processes in the
main.nf
script based on your specific requirements.
- This pipeline utilizes various bioinformatics tools and databases, including PLINK, bcftools, SnpSift, snpEff, dbNSFP, and dbSNP.