-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'dev' into MarieLataretu/issue55
- Loading branch information
Showing
19 changed files
with
195 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Changelog | ||
|
||
## [v1.0.0-alpha] - 2023-09-30 | ||
|
||
### Changed | ||
|
||
- changed input parameter usage: | ||
- before: `--[nano|illumina|illumina_single_end|fasta]` | ||
- now: `--input_type [nano|illumina|illumina_single_end|fasta] --input *.fastq` | ||
- changed workflow figure to a nicer figure | ||
- changed workflow structure (introducing subworkflows) | ||
- input files with the suffix `clean` are not allowed | ||
|
||
### Added | ||
|
||
- added CHANGELOG.md, Citations.md and citation information | ||
- added `--cleanup_work_dir` to remove work dir files after a successful run | ||
- added `--min_clip` to filter mapped reads by soft-clipped length | ||
- added `--dcs_strict` to use only DCS reads with artificial ends | ||
- added `stub` command for Nextflow prototyping | ||
- added `idxstats` | ||
|
||
## Fixed | ||
|
||
- pipeline report with timestamp | ||
- `--split-prefix` parameter for `minimap2` | ||
- make concat contamination more efficient |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# CLEAN: Citations | ||
|
||
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) | ||
|
||
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. | ||
## Pipeline tools | ||
|
||
- [BBMap](https://sourceforge.net/projects/bbmap/) | ||
|
||
- [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) | ||
|
||
> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. | ||
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | ||
|
||
- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/) | ||
> Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996. | ||
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) | ||
|
||
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. | ||
- [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/37171891/) | ||
|
||
> Wouter C, Rademakers R. NanoPack2: Population scale evaluation of long-read sequencing data. Bioinformatics. 2023 May 12;39(5):btad311. doi: 10.1093/bioinformatics/btad311. Epub ahead of print. PMID: 37171891; PMCID: PMC10196664. | ||
- [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) | ||
|
||
> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. | ||
- [QUAST](https://pubmed.ncbi.nlm.nih.gov/23422339/) | ||
|
||
> Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19. PMID: 23422339; PMCID: PMC3624806. | ||
## Software packaging/containerisation tools | ||
|
||
- [Anaconda](https://anaconda.com) | ||
|
||
> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. | ||
- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) | ||
|
||
> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. | ||
- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) | ||
|
||
> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. | ||
- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) | ||
|
||
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) | ||
> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
BSD 3-Clause License | ||
|
||
Copyright (c) 2022, Martin Hölzer | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
3. Neither the name of the copyright holder nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,11 +2,11 @@ | |
|
||
A decontamination workflow for short reads, long reads and assemblies. | ||
|
||
![](https://img.shields.io/badge/nextflow-19.10.0-brightgreen) | ||
![](https://img.shields.io/badge/nextflow-21.04.0-brightgreen) | ||
![](https://img.shields.io/badge/uses-docker-blue.svg) | ||
![](https://img.shields.io/badge/uses-conda-yellow.svg) | ||
|
||
Email: [email protected], marie.lataretu@uni-jena.de | ||
Email: [email protected], lataretum@rki.de | ||
|
||
## Objective | ||
|
||
|
@@ -102,8 +102,20 @@ Included in this repository are: | |
|
||
... for reasons. More can be easily added! Just write me, add an issue or make a pull request. | ||
|
||
## Flowchart | ||
## Workflow | ||
|
||
![chart](data/figures/workflow.png) | ||
|
||
<sub><sub>The icons and diagram components that make up the schematic view were originally designed by James A. Fellow Yates & nf-core under a CCO license (public domain).</sub></sub> | ||
<sub><sub>The icons and diagram components that make up the schematic view were originally designed by James A. Fellow Yates & nf-core under a CCO license (public domain).</sub></sub> | ||
|
||
## Citations | ||
|
||
If you use `CLEAN` in your work, please consider citing our preprint: | ||
|
||
> Targeted decontamination of sequencing data with CLEAN | ||
> | ||
> Marie Lataretu, Sebastian Krautwurst, Adrian Viehweger, Christian Brandt, Martin Hölzer | ||
> | ||
> bioRxiv 2023.08.05.552089; doi: https://doi.org/10.1101/2023.08.05.552089 | ||
Additionally, an extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,7 @@ Author: [email protected] | |
|
||
// Parameters sanity checking | ||
|
||
Set valid_params = ['max_cores', 'cores', 'max_memory', 'memory', 'profile', 'help', 'input', 'input_type', 'list', 'host', 'own', 'control', 'keep', 'rm_rrna', 'bbduk', 'bbduk_kmer', 'bbduk_qin', 'reads_rna', 'min_clip', 'dcs_strict', 'output', 'multiqc_dir', 'nf_runinfo_dir', 'databases', 'condaCacheDir', 'singularityCacheDir', 'singularityCacheDir', 'cloudProcess', 'conda-cache-dir', 'singularity-cache-dir', 'cloud-process', 'publish_dir_mode'] // don't ask me why there is also 'conda-cache-dir', 'singularity-cache-dir', 'cloud-process' | ||
Set valid_params = ['max_cores', 'cores', 'max_memory', 'memory', 'profile', 'help', 'input', 'input_type', 'list', 'host', 'own', 'control', 'keep', 'rm_rrna', 'bbduk', 'bbduk_kmer', 'bbduk_qin', 'reads_rna', 'min_clip', 'dcs_strict', 'output', 'multiqc_dir', 'nf_runinfo_dir', 'databases', 'cleanup_work_dir','condaCacheDir', 'singularityCacheDir', 'singularityCacheDir', 'cloudProcess', 'conda-cache-dir', 'singularity-cache-dir', 'cloud-process', 'publish_dir_mode'] // don't ask me why there is also 'conda-cache-dir', 'singularity-cache-dir', 'cloud-process' | ||
def parameter_diff = params.keySet() - valid_params | ||
if (parameter_diff.size() != 0){ | ||
exit 1, "ERROR: Parameter(s) $parameter_diff is/are not valid in the pipeline!\n" | ||
|
@@ -143,6 +143,8 @@ if ( params.rm_rrna ){ | |
|
||
if ( params.host ) { | ||
hostNameChannel = Channel.from( params.host ).splitCsv().flatten() | ||
} else { | ||
hostNameChannel = Channel.empty() | ||
} | ||
|
||
// user defined fasta sequence | ||
|
@@ -189,7 +191,7 @@ include { qc } from './workflows/qc_wf' | |
**************************/ | ||
|
||
workflow { | ||
prepare_contamination(nanoControlFastaChannel, illuminaControlFastaChannel, rRNAChannel) | ||
prepare_contamination(nanoControlFastaChannel, illuminaControlFastaChannel, rRNAChannel, hostNameChannel, ownFastaChannel) | ||
contamination = prepare_contamination.out | ||
|
||
clean(input_ch, contamination, nanoControlBedChannel) | ||
|
@@ -266,7 +268,7 @@ def helpMSG() { | |
${c_green}--bbduk_qin${c_reset} set quality ASCII encoding for bbduk [default: $params.bbduk_qin; options are: 64, 33, auto] | ||
${c_green}--reads_rna${c_reset} add this flag for noisy direct RNA-Seq Nanopore data [default: $params.reads_rna] | ||
${c_green}--min_clip${c_reset} filter mapped reads by soft-clipped lenth (left + right). If >= 1 total | ||
${c_green}--min_clip${c_reset} filter mapped reads by soft-clipped length (left + right). If >= 1 total | ||
number; if < 1 relative to read length | ||
${c_green}--dcs_strict${c_reset} filter out alignments that cover artificial ends of the ONT DCS to discriminate between Lambda Phage and DCS | ||
|
@@ -287,6 +289,10 @@ def helpMSG() { | |
--condaCacheDir defines the path where environments (conda) are cached [default: $params.condaCacheDir] | ||
--singularityCacheDir defines the path where images (singularity) are cached [default: $params.singularityCacheDir] | ||
${c_yellow}Miscellaneous:${c_reset} | ||
--cleanup_work_dir deletes all files in the work directory after a successful completion of a run [default: $params.cleanup_work_dir] | ||
${c_dim}warning: if ture, the option will prevent the use of the resume feature!${c_reset} | ||
${c_yellow}Profile:${c_reset} | ||
You can merge different profiles for different setups, e.g. | ||
|
@@ -303,6 +309,7 @@ def helpMSG() { | |
docker | ||
singularity | ||
conda | ||
mamba | ||
ebi (lsf,singularity; preconfigured for the EBI cluster) | ||
yoda (lsf,singularity; preconfigured for the EBI YODA cluster) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.