- GENERAL-INFO
- INSTALL
- RUN
- OUTPUT
- FLOW-DIAGRAM
- TOOLS
- PANGOLIN
- DOCKER
- SINGULARITY
- CONTRIBUTORS
- REMARKS
- REFERENCE
- DISCLAIMER
- LICENSE
This github repository contains an automated pipeline dedicated to properly analyse the EasySeq SARS-CoV-2 (COVID-19) sequence sequencing data. Validated with 150/151 bp paired-end reads.
Advice is to redownload the conda.tar.gz after each update to be sure that all conda environments are set in place.
- Fix for incorrect trimming of primer which result in an incorrect BA.2 ORF1A:I3758V mutation
- New primer files added
- option using primerVersion to change version of primers
- This version includes final fix for HV69-70 region
- non-covered region and indel in consensus solved (bcftools 1.12)
- New nomenclature of SARS-CoV-2 through using new pangolin version is implemented.
- Use code version v0.7.0 or later
- Implemented lofreq for variant calling which gives much more accurate calls in the report. Consensus output is mostly unaffected.
- Use version v0.5.2 of the github https://github.com/JordyCoolen/easyseq_covid19/releases/tag/v0.5.2
In short:
- Automated pipeline to analyse Illumina EasySeq COVID-19 samples to a variant report
- The pipeline cleans the Illumina sequencing data
- Uses the SARS-CoV-2 reference genome (NC_045512.2)
- Custom EasySeq Primer filtering and correction
- Mutations and deletions are measured
- Fasta consensus of the sample is created
- Lineage is determined
- Output is available in a structured way
- Full QC reports are created
- PDF and HTML report as output
- install docker on your OS
- docker pull jonovox/easyseq_covid19:latest
- download the newest release of the pipeline via https://github.com/JordyCoolen/easyseq_covid19/releases
- extract the source code
- go into the extracted/project folder
- download conda environments via: https://surfdrive.surf.nl/files/index.php/s/ggoLXzMoa5iSZYa
- extract conda.tar.gz into the project folder created at step 5
- Proceed to RUN examples
- now you have to perform the test to set everything in place
- first time running the variant pipeline will deploy more conda environments needed to successfully install the pipeline. This can take a while.
- open docker runtime container from image with write rights
sh docker/run.sh covid jonovox/easyseq_covid19:latest
- run the test sample inside the container
nextflow run COVID.nf --sampleName test -resume --outDir /workflow/output/test --reads "/workflow/input/test_OUT01_R{1,2}.fastq.gz"
- you can also execute multiple samples in non-parallel way
bash scripts/run_batch.sh <path to folders containing the fastq.gz file> <extension of files> <threads> jonovox/easyseq_covid19:latest
/workflow/output/test/
├── QC
│ ├── multiqc_data
│ │ ├── multiqc.log
│ │ ├── multiqc_data.json
│ │ ├── multiqc_fastp.txt
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc_snpeff.txt
│ │ └── multiqc_sources.txt
│ ├── multiqc_report.html
│ ├── stats.txt
│ ├── test.fastp.json
│ ├── test.mosdepth.global.dist.txt
│ ├── test.mosdepth.summary.txt
│ ├── test.per-base.bed.gz
│ ├── test.per-base.bed.gz.csi
│ └── test_snpEff.csv
├── annotation
│ ├── snpEff_summary.html
│ ├── test_annot_table.txt
│ ├── test_snpEff.csv
│ └── test_snpEff.genes.txt
├── lineage
│ └── lineage_report.csv
├── mapping
│ ├── test.bam
│ ├── test.bam.bai
│ ├── test.final.bam
│ └── test.final.bam.bai
├── rawvcf
│ └── test.raw.vcf
├── report
│ ├── parameters.txt
│ ├── test.fasta
│ ├── test.html
│ └── test.pdf
├── uncovered
│ ├── test_noncov.bed
│ └── test_ubiq.bed
└── vcf
├── notpassed
│ └── test.notpassed.vcf
├── test.final.vcf
├── test.final.vcf.gz
├── test.final.vcf.gz.csi
└── test.variants.vcf
- nextflow
- python
- conda/bioconda
- fastp
- BWA MEM
- samtools
- bcftools
- lofreq
- mosdepth
- bedtools
- snpEff
- multiQC
- pangolin v3.0.5 (pangoLEARN 2021-06-05) (default in conda.tar.gz)
sh docker/run.sh covid jonovox/easyseq_covid19:latest
conda activate /workflow/conda/env-pangolin
pangolin --update
cd easyseq_covid19
docker build --rm -t <image name> ./
singularity build <imagename>.simg docker://jonovox/easyseq_covid19:latest
Department of Medical Microbiology and Radboudumc Center for Infectious Diseases, Radboud university medical center, Nijmegen, The Netherlands
- J.P.M. Coolen ([email protected])
NimaGen B.V., Nijmegen, The Netherlands
- R.A. Lammerts (NimaGen B.V., Nijmegen, The Netherlands)
- J.T. Vonk (Student HAN Bioinformatics, Nijmegen, The Netherlands)
spike S
21765-21770 HV 69-70 deletion
Version 1 and 2 of the EasySeq RC-PCR SARS-CoV-2 WGS kit are not completly overlapping the region 21765-21770 / HV 69-70.
If you use these versions of the WGS kit please use:
variant pipeline v0.5.2
https://github.com/JordyCoolen/easyseq_covid19/releases/tag/v0.5.2
----> This version solves the not overlapping region of 21765-21770 by using a template based strategy using KMA.
<--- This method measures which template matches best. Either Wildtype (NC_045512.2) or
a variant containing the 21765-21770 / HV 69-70 deletion. The result of this strategy
is projected in the VCF to ensure correct output. This works perfect for now because no other deletions are
known on this exact location.
variant pipeline v0.7.0 or later
----> In Version 3 of the EasySeq RC-PCR SARS-CoV-2 WGS kit the region 21765-21770 / HV 69-70 region is
<- -- complety overlapping by having a new primer design. This version of the variant pipeline handles the
data obtained using version 3 correctly.
For citing this work please cite:
Coolen, J. P., Wolters, F., Tostmann, A., van Groningen, L. F., Bleeker-Rovers, C. P., Tan, E. C., ... & Melchers, W. J. (2021). SARS-CoV-2 whole-genome sequencing using reverse complement PCR: For easy, fast and accurate outbreak and variant analysis. Journal of Clinical Virology, 144, 104993. https://doi.org/10.1016/j.jcv.2021.104993
Also cite the other programs used, see list of used tools
This is for Research Only. The code and pipeline is continuously under development. We cannot guarantee a full error free result. Especially with the fast developments in SARS-CoV-2/COVID-19 sequencing and the continuously mutating nature of the virus.