Skip to content

Latest commit

 

History

History
186 lines (129 loc) · 12.4 KB

README.md

File metadata and controls

186 lines (129 loc) · 12.4 KB

Typing SVG

SNPless powered by Nextflow

snpless-nf - A Nextflow pipeline for time-course analysis with bacterial NGS whole-genome data.

Contributor Covenant MITlicense nextflow

Introduction

Pipeline summary

  1. QC
    1. FASTQC FastQC
    2. TRIM Trimmomatic
    3. PEAR pear
  2. GENMAP GenMap
  3. ASSEMBLY
    1. UNICYCLER Unicycler
    2. PROKKA prokka
  4. MAPPING
    1. BRESEQ breseq >> SAMTOOLS samtools add read group
    2. MINIMAP2 minimap2 >> SAMBLASTER samblaster remove duplicates
    3. BWA BWA >> SAMBLASTER samblaster remove duplicates
    4. COVERAGE samtools
  5. SNPCALLING
    1. FREEBAYES freebayes >> VCFFILTER vcflib >> VT Vt normalize >> decompose
    2. BCFTOOLS bcftools mpileup, call, vcfutils.pl varFilter >> VT Vt normalize >> decompose
    3. LOFREQ LoFreq indelqual, index, call-parallel
    4. VARSCAN varscan mpileup2snp, mpileup2indel
    5. MPILEUP samtools >> parse_mpileup.py >> annotate_pvalues
    6. GDCOMPARE gdtools
  6. SVCALLING
    1. PINDEL pindel
    2. GRIDSS GRIDSS
  7. FILTERING/MERGING
    1. BEDTOOLS bedtools
  8. ANNOTATION
    1. SNPEFF SnpEff
  9. PLOTTING
    1. PLOT R

Addtional Tools used for data conversion and data analysis:

Quickstart

  1. Install Nextflow (>=21.10.0)

Install Nextflow by using the following command:

curl -s https://get.nextflow.io | bash

or

Install Nextflow by using conda:

conda create -n nf python=3
conda activate nf
conda install -c bioconda nextflow
  1. Download the pipeline
git clone https://github.com/kullrich/snpless-nf.git
  1. Test the pipeline on an minimal dataset with a single command:

Using nextflow conda environment:

conda activate nf
nextflow run snpless-nf -profile test
  1. Start running your own analysis:

Check the necessary input files!

nextflow run snpless-nf --input <samples.tsv> --reference <genome.fna> --gff3 <genome.gff3> --proteins <genome.gbff>

Full example dataset

Get example files (8.6 GB)

Download via wget:

cd snpless-nf/examples
wget -O behringer2018.tar.gz https://owncloud.gwdg.de/index.php/s/fqD9ik2s3FReOUn/download
tar -xvf behringer2018.tar.gz

Download via weblink:

behringer2018 - samples 113, 129, 221

Run full example dataset

Using nextflow conda environment:

conda activate nf
nextflow run snpless-nf --input behringer2018/behringer2018_113.txt --reference behringer2018/GCF_000005845.2_ASM584v2_genomic.fna --gff3 GCF_000005845.2_ASM584v2_genomic.gff --proteins behringer2018/GCF_000005845.2_ASM584v2_genomic.gbff

Pipeline usage

see a detailed description here: usage

Input files

Pipeline parameters

see a detailed description here: parameters

Pipeline output

see a detailed description here: output

Licence

MIT (see LICENSE)

Contributing Code

If you would like to contribute to snpless-nf, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.

Before you do a pull request, you should always file an issue and make sure that someone from the snpless-nf developer team agrees that it’s a problem, and is happy with your basic proposal for fixing it.

Once an issue has been filed and we've identified how to best orient your contribution with package development as a whole, fork the main repo, branch off a feature branch from master, commit and push your changes to your fork and submit a pull request for snpless-nf:master.

By contributing to this project, you agree to abide by the Code of Conduct terms.

Bug reports

Please report any errors or requests regarding snpless-nf to Kristian Ullrich ([email protected])

Code of Conduct - Participation guidelines

This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)

See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.

By contributing to this project, you agree to abide by its terms.

References - Examples

Behringer, Megan G., et al. "Escherichia coli cultures maintain stable subpopulation structure during long-term evolution." Proceedings of the National Academy of Sciences 115.20 (2018): E4642-E4650. https://www.pnas.org/content/115/20/E4642.short

References - Tools

  1. Good, Benjamin H., et al. "The dynamics of molecular evolution over 60,000 generations." Nature 551.7678 (2017): 45-50. link
  2. Di Tommaso, Paolo, et al. "Nextflow enables reproducible computational workflows." Nature biotechnology 35.4 (2017): 316-319. link
  3. Andrews, Simon. "FastQC: a quality control tool for high throughput sequence data. 2010." (2017): W29-33. link 4.Bolger, Anthony M., Marc Lohse, and Bjoern Usadel. "Trimmomatic: a flexible trimmer for Illumina sequence data." Bioinformatics 30.15 (2014): 2114-2120. link
  4. Zhang, Jiajie, et al. "PEAR: a fast and accurate Illumina Paired-End reAd mergeR." Bioinformatics 30.5 (2014): 614-620. link
  5. Pockrandt, Christopher, et al. "GenMap: ultra-fast computation of genome mappability." Bioinformatics 36.12 (2020): 3687-3692. link
  6. Wick, Ryan R., et al. "Unicycler: resolving bacterial genome assemblies from short and long sequencing reads." PLoS computational biology 13.6 (2017): e1005595. link
  7. Seemann, Torsten. "Prokka: rapid prokaryotic genome annotation." Bioinformatics 30.14 (2014): 2068-2069. link
  8. Deatherage, Daniel E., and Jeffrey E. Barrick. "Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq." Engineering and analyzing multicellular systems. Humana Press, New York, NY, 2014. 165-188. link
  9. Li, Heng. "Minimap2: pairwise alignment for nucleotide sequences." Bioinformatics 34.18 (2018): 3094-3100. link
  10. Li, Heng. "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM." arXiv preprint arXiv:1303.3997 (2013). link
  11. Faust, Gregory G., and Ira M. Hall. "SAMBLASTER: fast duplicate marking and structural variant read extraction." Bioinformatics 30.17 (2014): 2503-2505. link
  12. Li, Heng, et al. "The sequence alignment/map format and SAMtools." Bioinformatics 25.16 (2009): 2078-2079. link
  13. Garrison, Erik, and Gabor Marth. "Haplotype-based variant detection from short-read sequencing." arXiv preprint arXiv:1207.3907 (2012). link
  14. Wilm, Andreas, et al. "LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets." Nucleic acids research 40.22 (2012): 11189-11201. link
  15. Koboldt, Daniel C., et al. "VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing." Genome research 22.3 (2012): 568-576. link
  16. Ye, Kai, et al. "Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads." Bioinformatics 25.21 (2009): 2865-2871. link
  17. Cameron, Daniel L., et al. "GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing." bioRxiv (2021): 2020-07. link
  18. Quinlan, Aaron R., and Ira M. Hall. "BEDTools: a flexible suite of utilities for comparing genomic features." Bioinformatics 26.6 (2010): 841-842. link
  19. Cingolani, Pablo, et al. "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3." Fly 6.2 (2012): 80-92. link
  20. Wickham, Hadley. "ggplot2." Wiley Interdisciplinary Reviews: Computational Statistics 3.2 (2011): 180-185. link