-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(technology): add ion torrent processing (#383)
* add ont toolchain * move medka * add primer clipping * add env for primer clipping * integrate reads in common.smk * fmt * add medaka_model to test config * update sample sheet * add check for benchmark data * fmt * add ci for ont * change schema for new sample sheet * linting * add ARTIC_v3_adapters * fmt * add ARTIC primers to resources * integrate ont pipieline * fmt * add missing wildcards * agg is_sample_technongoly in is_technology * change ont assembly from canu to spades se * fmt * make artic primers version adjustable * adjust fastp, kranken and samtools depth * rename rules * adjust kallisto * fmt * fmt * fix: kallisto_metrics log * fmt * fix: get_kraken_output date input * ci: change to -o * ci: add github ont reads * config update * adjust kallisto input * fmt * adjust assembler comparison for illumina only * fmt * fix: wildcard -> wildcards * refactor: get_technology * fmt * fix: call function * ci: rmv amplicon file * add threads for canu * add corThreads * add maxMemory * rmv restrictions * add corThreads * add threads * add redMemory * change to maxThreads and maxMemory * Revert "change to maxThreads and maxMemory" This reverts commit f053dcb. * add oeaMemory * add debug statement * print kallisto metrics * add missing space * add print for debug * add if * add testing return * fmt * adjust test log date and names * make canu params for testing * fmt * add lambda expression for testing * fix typo * fill empty rows in qc data sheet with "0" * add nano qc * fmt * update spades env * add vcf to medaka output * add polishing with medeka on de novo assembly * rmv print debug statements * refactor masking script * fmt * fix masked sequence writer * remove "manual" fasta parser * fmt * deal with empty rki filter * update sample sheet generation * extract ont read numbers for qc table * fmt * fix read counting * fix spades assembeler path due to version update * fmt * change to contigs.fasta * use raw_contigs with pe spades * add if statement * change to wildcard * change * * add missing wildcards * rmv canu correct folder * remove to long string from "Other Variants" column * add medaka variant calling * fmt * fix kraken * change to trimmed and not corrected reads for polishing * Revert "change to trimmed and not corrected reads for polishing" This reverts commit 81b3030. * add missing gz * make rki-filter less errorprone regarding samplenames * fmt * add human removal * fix samtools in bamclipper * add consesus * add identity overview * fmt * fixes * more fixes * change path of indicators * update report descriptions * update more descs * updats scripts * Change Pangolin Call to Lineage * comments * fix quast call * fixes, comments * fmt * fix in masking script * add longshot * fmt * change logging of porechop_primer_trimming * add porechop debug * update report generation * cleanup rule all * fmt * remove debug * add gzip keep flag * add polishing of consenus * fmt * fmt * fmt * fmt * touch ups * fmt * fmt * rmv unnecessary code * add ion torrent * fmt * refactor todos * improvements * fmt * update selection functions * add docstrings * updat samplesheet * fmt * renaming * rmv patterns from get_reads_after_qc * update testing * fix technology matrix * split all and benchmarks * fmt * fmt * add data for compare_assemblers * fix indent * indent * another indent * rmv paranthesis * Always download test data * update artifacts * add amplicon tests * add missing patterns * fmt * update assembler config * update test config, main.yml * add qoutes * add missing gz * fix path * add pe flags for assembler comparison * rmv contigs output flag * temp fix adapters * add conda caching Co-authored-by: simakro <[email protected]>
- Loading branch information
Showing
21 changed files
with
800 additions
and
395 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,9 +49,124 @@ jobs: | |
snakefile: workflow/Snakefile | ||
stagein: mamba install -n snakemake -c conda-forge peppy | ||
args: "--lint" | ||
|
||
|
||
Technology-Tests: | ||
runs-on: ubuntu-latest | ||
env: | ||
GISAID_API_TOKEN: ${{ secrets.GISAID_API_TOKEN }} | ||
needs: | ||
- Formatting | ||
- Linting | ||
strategy: | ||
matrix: | ||
rule: [all, all -npr] | ||
technology: [all, illumina, ont, ion] | ||
seq_method: [shotgun, amplicon] | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Cache conda dependencies | ||
uses: actions/cache@v2 | ||
with: | ||
path: | | ||
.tests/.snakemake/conda | ||
key: technology-${{ runner.os }}-${{ matrix.rule }}-${{ matrix.technology }}-${{ matrix.seq_method }}-${{ hashFiles('*.tests/.snakemake/conda/*.yaml') }} | ||
|
||
- name: Prepare test data for all technologies | ||
if: steps.test-data.outputs.cache-hit != true && (startsWith(matrix.rule, 'all') && matrix.technology == 'all' || matrix.rule == 'compare_assemblers') | ||
run: | | ||
if [[ "${{ matrix.seq_method }}" = "shotgun" ]] ; then export AMPLICON=0; else export AMPLICON=1; fi | ||
mkdir -p .tests/data | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.1.fastq.gz | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.2.fastq.gz | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/ont_reads.fastq.gz > .tests/data/ont_reads.fastq.gz | ||
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR574/003/ERR5745913/ERR5745913.fastq.gz > .tests/data/ion_reads.fastq.gz | ||
echo sample_name,fq1,fq2,date,is_amplicon_data,technology > .tests/config/pep/samples.csv | ||
echo illumina-test,data/B117.1.fastq.gz,data/B117.2.fastq.gz,2022-01-01,$AMPLICON,illumina >> .tests/config/pep/samples.csv | ||
echo ont-test,data/ont_reads.fastq.gz,,2022-01-01,$AMPLICON,ont >> .tests/config/pep/samples.csv | ||
echo ion-test,data/ion_reads.fastq.gz,,2022-01-01,$AMPLICON,ion >> .tests/config/pep/samples.csv | ||
- name: Prepare test data for Illumina | ||
if: steps.test-data.outputs.cache-hit != true && (startsWith(matrix.rule, 'all') && matrix.technology == 'illumina' || matrix.rule == 'compare_assemblers') | ||
run: | | ||
if [[ "${{ matrix.seq_method }}" = "shotgun" ]] ; then export AMPLICON=0; else export AMPLICON=1; fi | ||
mkdir -p .tests/data | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.1.fastq.gz | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.2.fastq.gz | ||
echo sample_name,fq1,fq2,date,is_amplicon_data,technology > .tests/config/pep/samples.csv | ||
echo illumina-test,data/B117.1.fastq.gz,data/B117.2.fastq.gz,2022-01-01,$AMPLICON,illumina >> .tests/config/pep/samples.csv | ||
- name: Prepare test data for Oxford Nanopore | ||
if: steps.test-data.outputs.cache-hit != true && (startsWith(matrix.rule, 'all') && matrix.technology == 'ont' || matrix.rule == 'compare_assemblers') | ||
run: | | ||
if [[ "${{ matrix.seq_method }}" = "shotgun" ]] ; then export AMPLICON=0; else export AMPLICON=1; fi | ||
mkdir -p .tests/data | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/ont_reads.fastq.gz > .tests/data/ont_reads.fastq.gz | ||
echo sample_name,fq1,date,is_amplicon_data,technology > .tests/config/pep/samples.csv | ||
echo ont-test,data/ont_reads.fastq.gz,2022-01-01,$AMPLICON,ont >> .tests/config/pep/samples.csv | ||
- name: Prepare test data for Ion Torrent | ||
if: steps.test-data.outputs.cache-hit != true && (startsWith(matrix.rule, 'all') && matrix.technology == 'ion' || matrix.rule == 'compare_assemblers') | ||
run: | | ||
if [[ "${{ matrix.seq_method }}" = "shotgun" ]] ; then export AMPLICON=0; else export AMPLICON=1; fi | ||
mkdir -p .tests/data | ||
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR574/003/ERR5745913/ERR5745913.fastq.gz > .tests/data/ion_reads.fastq.gz | ||
echo sample_name,fq1,date,is_amplicon_data,technology > .tests/config/pep/samples.csv | ||
echo ion-test,data/ion_reads.fastq.gz,2022-01-01,$AMPLICON,ion >> .tests/config/pep/samples.csv | ||
- name: Use smaller reference files for testing | ||
if: steps.test-resources.outputs.cache-hit != true | ||
run: | | ||
# mkdir -p .tests/resources/minikraken-8GB | ||
# curl -SL https://github.com/thomasbtf/small-kraken-db/raw/master/human_k2db.tar.gz | tar zxvf - -C .tests/resources/minikraken-8GB --strip 1 | ||
mkdir -p .tests/resources/genomes | ||
curl -SL "https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=NC_000021.9&db=nuccore&report=fasta" | gzip -c > .tests/resources/genomes/human-genome.fna.gz | ||
- name: Simulate GISAID download | ||
run: | | ||
mkdir -p .tests/results/benchmarking/tables | ||
echo -e "resources/genomes/B.1.1.7.fasta\nresources/genomes/B.1.351.fasta" > .tests/results/benchmarking/tables/strain-genomes.txt | ||
mkdir -p .tests/resources/genomes | ||
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MZ314997.1&rettype=fasta" | sed '$ d' > .tests/resources/genomes/B.1.1.7.fasta | ||
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MZ314998.1&rettype=fasta" | sed '$ d' > .tests/resources/genomes/B.1.351.fasta | ||
- name: Test rule ${{ matrix.rule }} on ${{ matrix.technology }} ${{ matrix.seq_method }} data | ||
uses: snakemake/[email protected] | ||
with: | ||
directory: .tests | ||
snakefile: workflow/Snakefile | ||
args: "--use-conda --show-failed-logs --cores 2 --resources ncbi_api_requests=1 --conda-cleanup-pkgs cache --conda-frontend mamba ${{ matrix.rule }}" | ||
|
||
- name: Test report | ||
uses: snakemake/[email protected] | ||
if: startsWith(matrix.rule, 'all -npr') != true | ||
with: | ||
directory: .tests | ||
snakefile: workflow/Snakefile | ||
args: "${{ matrix.rule }} --report report.zip" | ||
|
||
- name: Upload report | ||
uses: actions/upload-artifact@v2 | ||
if: matrix.technology == 'all' && matrix.rule != 'all -npr' | ||
with: | ||
name: report-rule-${{ matrix.rule }}-${{ matrix.technology }}-${{ matrix.seq_method }} | ||
path: .tests/report.zip | ||
|
||
- name: Upload logs | ||
uses: actions/upload-artifact@v2 | ||
if: matrix.technology == 'all' && matrix.rule != 'all -npr' | ||
with: | ||
name: log-rule-${{ matrix.rule }}-technology-${{ matrix.technology }} | ||
path: .tests/logs/ | ||
|
||
- name: Change permissions for caching | ||
run: sudo chmod -R 755 .tests/.snakemake/conda | ||
|
||
- name: Print disk space | ||
run: sudo df -h | ||
|
||
Testing: | ||
Benchmarks-Tests: | ||
runs-on: ubuntu-latest | ||
env: | ||
GISAID_API_TOKEN: ${{ secrets.GISAID_API_TOKEN }} | ||
|
@@ -60,10 +175,18 @@ jobs: | |
- Linting | ||
strategy: | ||
matrix: | ||
rule: [all, all -npr, benchmark_strain_calling, benchmark_assembly, benchmark_mixtures, benchmark_non_sars_cov_2, compare_assemblers, benchmark_reads] | ||
rule: [benchmark_strain_calling, benchmark_assembly, benchmark_mixtures, benchmark_non_sars_cov_2, benchmark_reads, compare_assemblers] | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Cache conda dependencies | ||
uses: actions/cache@v2 | ||
with: | ||
path: | | ||
.tests/.snakemake/conda | ||
key: benchmarks-${{ runner.os }}-${{ matrix.rule }}-${{ matrix.technology }}-${{ matrix.seq_method }}-${{ hashFiles('*.tests/.snakemake/conda/*.yaml') }} | ||
|
||
|
||
# TODO caches are currently completely misleading, as they lead to certain files becoming present on disk which might | ||
# then hide failures that would otherwise be seen. | ||
|
||
|
@@ -145,14 +268,16 @@ jobs: | |
# ${{ runner.os }}-sars-cov-benchmark-dependencies-${{ steps.get-date.outputs.date }}- | ||
# ${{ runner.os }}-sars-cov-benchmark-dependencies- | ||
|
||
- name: Download test data | ||
if: steps.test-data.outputs.cache-hit != true && (startsWith(matrix.rule, 'all') || matrix.rule == 'compare_assemblers') | ||
|
||
- name: Prepare test data | ||
if: steps.test-data.outputs.cache-hit != true | ||
run: | | ||
mkdir -p .tests/data | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.1.fastq.gz | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/B.1.1.7.reads.1.fastq.gz > .tests/data/B117.2.fastq.gz | ||
curl -L https://github.com/thomasbtf/small-kraken-db/raw/master/ont_reads.fastq.gz > .tests/data/ont_reads.fastq.gz | ||
echo sample_name,fq1,fq2,date,is_amplicon_data,technology > .tests/config/pep/samples.csv | ||
echo illumina-test,data/B117.1.fastq.gz,data/B117.2.fastq.gz,2022-01-01,0,illumina >> .tests/config/pep/samples.csv | ||
- name: Use smaller reference files for testing | ||
if: steps.test-resources.outputs.cache-hit != true | ||
run: | | ||
|
@@ -169,8 +294,7 @@ jobs: | |
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MZ314997.1&rettype=fasta" | sed '$ d' > .tests/resources/genomes/B.1.1.7.fasta | ||
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MZ314998.1&rettype=fasta" | sed '$ d' > .tests/resources/genomes/B.1.351.fasta | ||
- name: Test rule ${{ matrix.rule }} | ||
- name: Test rule ${{ matrix.rule }} | ||
uses: snakemake/[email protected] | ||
with: | ||
directory: .tests | ||
|
@@ -185,16 +309,16 @@ jobs: | |
snakefile: workflow/Snakefile | ||
args: "${{ matrix.rule }} --report report.zip" | ||
|
||
- name: Upload report | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: report-${{ matrix.rule }} | ||
path: .tests/report.zip | ||
# - name: Upload report | ||
# uses: actions/upload-artifact@v2 | ||
# with: | ||
# name: report-rule-${{ matrix.rule }} | ||
# path: .tests/report.zip | ||
|
||
- name: Upload logs | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: log-${{ matrix.rule }} | ||
name: log-rule-${{ matrix.rule }} | ||
path: .tests/logs/ | ||
|
||
# - name: Unit test | ||
|
@@ -226,7 +350,7 @@ jobs: | |
cat .tests/results/benchmarking/assembly/pseudoassembly.csv | ||
if [[ $(tail -1 .tests/results/benchmarking/assembly/pseudoassembly.csv) < 0.95 ]] | ||
then | ||
echo "Pseudoassembly bechmarking failed. There is at least one assembly where the contigs do not cover 95% of the original sequence (see above)." | ||
echo "Pseudoassembly benchmarking failed. There is at least one assembly where the contigs do not cover 95% of the original sequence (see above)." | ||
exit 1 | ||
else | ||
echo "Pseudoassembly was successful." | ||
|
@@ -238,7 +362,7 @@ jobs: | |
cat .tests/results/benchmarking/assembly/assembly.csv | ||
if [[ $(tail -1 .tests/results/benchmarking/assembly/assembly.csv) < 0.8 ]] | ||
then | ||
echo "Assembly bechmarking failed. There is at least one assembly where the contigs do not cover 80% of the original sequence (see above)." | ||
echo "Assembly benchmarking failed. There is at least one assembly where the contigs do not cover 80% of the original sequence (see above)." | ||
exit 1 | ||
else | ||
echo "Assembly was successful." | ||
|
@@ -261,8 +385,8 @@ jobs: | |
echo "Workflow sucessfully identified samples as non-sars-cov-2 in all cases." | ||
fi | ||
- name: Print disk space | ||
run: sudo df -h | ||
|
||
- name: Change permissions for caching | ||
run: sudo chmod -R 755 .tests/.snakemake/conda | ||
|
||
- name: Print disk space | ||
run: sudo df -h |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
sample_name,fq1,fq2,date,is_amplicon_data,technology | ||
NAME,PATH/TO/fq1,PATH/TO/fq2,ID,0,illumina | ||
NAME,PATH/TO/fq1,,ID,1,ont | ||
SAMPLE_NAME_1,PATH/TO/fq1,PATH/TO/fq2,SEQUENCING_DATE,0,illumina # Required information for a sample sequencing on the Illumina platform | ||
SAMPLE_NAME_2,PATH/TO/fq,,SEQUENCING_DATE,1,ont # Required information for a sample sequencing on the Oxford Nanopore platform | ||
SAMPLE_NAME_3,PATH/TO/fq,,SEQUENCING_DATE,1,ion # Required information for a sample sequencing on the Ion Torrent platform |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,5 +3,3 @@ channels: | |
- conda-forge | ||
dependencies: | ||
- bamclipper =1.0 | ||
- fgbio = 1.3 | ||
- samtools = 1.9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
channels: | ||
- bioconda | ||
- conda-forge | ||
dependencies: | ||
- fgbio = 1.3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,4 @@ channels: | |
- bioconda | ||
- conda-forge | ||
dependencies: | ||
- samtools =1.10 | ||
- samtools =1.14 |
Oops, something went wrong.