This repository contains some example code for processing reads from QIAGEN QIAseq DNA enrichment kits.
The top level script run_qiaseq_dna.py process reads in fastq format to produce an annotated vcf along with relevant UMI and read level metrics. Variants are called using smCounter. The smCounter-v1 variant calling procedure is described here:
The smCounter variant caller uses a statistical model that requires both raw sequencer base calls and identification of unique input molecules using the UMI tag and the genome position of the random fragmentation site.
smCounter-v2 publication is under review. The pre-print is available here :
https://www.biorxiv.org/content/early/2018/03/14/281659
There are additional scripts under misc_workflow/ which contain the following :
-
run_dedup - Use the Picard MarkDuplicates utility from the Broad Institute to remove PCR duplicate reads using the UMI and both paired-end read start locations on the reference genome. This might be useful for subsequent germline variant calling, or structural variant detection applications. Note that this script does not filter ligation chimera reads, nor does it remove PCR duplicate reads caused by internal re-priming by a downstream SPE primer.
-
run_consensus - Use the "fgbio" package from Fulcrum Genomics to create a single consensus read pair for each input molecule. This might be useful to prepare consensus read alignments for a subsequent SNP/indel variant calling procedure that does not use UMI-tagged reads (such as VarDict, MuTect, etc.). Users might need to tune the fgbio parameters (in addtion to variant calling parameters) for their application. We have not performed any variant calling performance benchmarking using the consensus-read BAM generated by this pipeline.
This package contains modules that trim common regions of the reads, align reads to the reference genome, identify putative original input molecules, and trim SPE primer regions from the genome alignments. These steps generate a BAM file suitable for subsequent variant calling.
This package contains auxiliary modules that provide read accounting and enrichment summary metrics such as uniformity and fragment length distribution.
This package contains the smCounter-v1 variant caller
This package contains the smCounter-v2 variant caller
This package contains downstream VCF annotation using snpEff.
The python modules in this repository have many dependencies on third-party NGS software (e.g. BWA, samtools, etc.) and GNU Linux utilities (sort, zcat, etc). Please DO NOT ATTEMPT to use the python modules in this git repository without first running the code on the example read set using our Docker image:
### Pull the docker image
sudo docker pull rpadmanabhan9/qiaseq-dna:pipeline
### Run a container from the image above interactively, mounting your run directory i.e. directory where the output files will be created
sudo docker run -it -v /home/your_fav_dir/:/mnt/qiaseq-run/ rpadmanabhan9/qiaseq-dna:pipeline
### Change directory and get the latest code from github
cd /srv/qgen/code/
git clone --recursive https://github.com/qiaseq/qiaseq-dna.git
### Change to run directory and copy over parameters file
cd /mnt/qiaseq-run/
cp /srv/qgen/code/qiaseq-dna/run_sm_counter_v1.params.txt ./
### Edit the bottom of run_consensus.params.txt if you need to change the read set and primer file
### Run the pipeline
python /srv/qgen/code/qiaseq-dna/run_qiaseq_dna.py run_sm_counter_v1.params.txt v1 single NEB_S2 > run.log 2>&1 &
The parameters are explained below :
run_sm_counter_v1.params.txt : The config file with prepopulated parameters.
v1 : smCounter variant caller version to use. You can specify v1 or v2. Please use run_sm_counter_v2.params.txt if specifying v2.
single : Whether this is a single read set analysis or tumor-normal.
NEB_S2 : Corresponds to the name of the readSet, should match the section in the params file. For tumor-normal analysis please specify the two read set names delimited by a space.
For ion torrent reads
### convert unmapped bam to fastq, this can be done inside the container
bedtools bamtofastq -i {ubam} -fq {fastq1}
uBam : your unmapped bam file
fastq1 : R1 fastq file name
Update the parameters in the config file :
uBam: Add a new parameter in the readSet section in the bottom with the path to the unmapped bam file
readFile1: Should be the R1 fastq obtained from the above bedtools command
readFile2: Replace 'R1' with 'R2' in readFile1
platform: Change this to IonTorrent
Run the python script run_qiaseq_dna.py as before
The dependencies are fully documented in the Dockerfile in this repository.
Please address questions to [email protected], with CC to [email protected].