Seqkit #59

sarahjeeeze · 2024-10-29T14:53:30Z

Add seqkit stats nf-core module

PR checklist

github-actions · 2024-10-29T14:57:46Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 76b882d

+| ✅ 191 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  21 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in README.md: TODO nf-core:
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-seqinspector_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-seqinspector_logo_light.png
files_exist - File found: docs/images/nf-core-seqinspector_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-seqinspector_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowSeqinspector.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.sample_size= 0
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: nf-test.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - SEQTK_SAMPLE found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - SEQKIT_STATS found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_GLOBAL found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_PER_TAG found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2024-10-30 10:41:32

MatthiasZepper

Thank you for your contribution!

Ultimately, the pipeline will allow to flexibly choose the tools that are run, so having more tools in the pipeline is always great, but what exactly was your rationale here?

Admittedly, I just skimmed over the description, but to me, it seems that it is predominantly meant to run on FASTA files and for judging the quality of genome assemblies. Unless I missed some relevant arguments, its application on sequencing reads is in my opinion does not really yield too many meaningful statistics - at least for Illumina, for Nanopore probably yes.

Do you happen to have a Nanopore example? For Illumina, this is basically all you get:

file	format	type	num_seqs	sum_len	min_len	avg_len	max_len
S11_L001_R1_001.fastq.gz	FASTQ	DNA	35875355	5417178605	151	151.0	151
S11_L001_R1_001.fastq.gz	FASTQ	DNA	35875355	322878195	9	9.0	9
S11_L001_R2_001.fastq.gz	FASTQ	DNA	35875355	5417178605	151	151.0	151

But with regard to the code, this already looks very good and tidy!

I am only missing a publishDir directive in the config, so that the reports from the stats output channel are published in a subfolder of the outdir.

CHANGELOG.md

CITATIONS.md

MatthiasZepper · 2024-10-29T17:41:18Z

conf/modules.config

@@ -22,6 +22,10 @@ process {
        ext.args = '--quiet'
    }

+    withName: SEQKIT_STATS {
+        ext.args = ''


Don't you not want to publish the results as well in the outdir? Or only MultiQC?

yeah it is defined here - I guess you would only overwrite it if you dont want your output published in the standard way.

seqinspector/conf/modules.config

Line 15 in 9cd232c

publishDir = [

Oooh fancy - a central publishDir directive for all modules ?!? That's smart and has totally escaped me...sorry then for the false accusations!

MatthiasZepper · 2024-10-29T17:46:52Z

workflows/seqinspector.nf

+    //
+    // MODULE: Run SEQKIT_STATS
+    //
+    SEQKIT_STATS (


Currently, you are neither mixing the output into the MultiQC channel nor publishing it in the outdir. Unless I am missing something, the results of the run are therefore not used at all?

yeah am outputting them

MatthiasZepper · 2024-10-29T17:55:55Z

docs/output.md

+
+</details>
+
+[SeqkitStats](https://bioinf.shenwei.me/seqkit/usage/#stats) it gives general quality metrics about your sequenced reads including average read lengths, GC(%) and n50's. For further reading and documentation see the [Seqkit help pages]([Seqkit help](https://bioinf.shenwei.me/seqkit/)).


It is true that SeqkitStats computes some quality metrics, but to my best knowledge it is more useful for FASTA files and genomic assemblies than sequencing reads?

For example, for an Illumina run, an average read length prior to trimming should be known, because it corresponds to the number of cycles. For Nanopore, admittedly, such a statistic is more useful, so you may want to structure the documentation accordingly?

Updated text a little, should i specifically mention this is more useful for nanopore data? for ilumina etc you do still get n50's etc but agree it is less useful - want me to add a nanopore test? :D

file format type num_seqs sum_len min_len avg_len max_len Q1 Q2 Q3 sum_gap N50 N50_num Q20(%) Q30(%) AvgQual GC(%) sample1_R1.fastq.gz FASTQ DNA 1377513 411872902 35 299.0 301 300.0 301.0 301.0 0 301 1 99.10 97.54 29.31 38.53 sample1_R2.fastq.gz FASTQ DNA 1377513 411840994 35 299.0 301 300.0 301.0 301.0 0 301 1 97.11 93.54 25.78 38.54

docs/output.md

sarahjeeeze added 2 commits October 28, 2024 15:52

add seqkit module

dad5d6d

seqkit

d806380

sarahjeeeze added 5 commits October 29, 2024 15:20

update docs

48ccc66

docs

879b579

fix-ci

1f582fe

actually add the module

de4bee8

changelog

9cd232c

MatthiasZepper requested changes Oct 29, 2024

View reviewed changes

sarahjeeeze added 2 commits October 30, 2024 09:55

update docs

eeb1577

mangled

60cb044

sarahjeeeze force-pushed the seqkit branch from fca5c18 to 60cb044 Compare October 30, 2024 10:11

fix gnarly spacing

f802608

MatthiasZepper approved these changes Oct 30, 2024

View reviewed changes

unmangle modules.json

76b882d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seqkit #59

Seqkit #59

sarahjeeeze commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper left a comment

MatthiasZepper Oct 29, 2024

sarahjeeeze Oct 30, 2024

MatthiasZepper Oct 30, 2024

MatthiasZepper Oct 29, 2024

sarahjeeeze Oct 30, 2024

MatthiasZepper Oct 29, 2024

sarahjeeeze Oct 30, 2024 •

edited

Loading


		</details>

		[SeqkitStats](https://bioinf.shenwei.me/seqkit/usage/#stats) it gives general quality metrics about your sequenced reads including average read lengths, GC(%) and n50's. For further reading and documentation see the [Seqkit help pages]([Seqkit help](https://bioinf.shenwei.me/seqkit/)).

Seqkit #59

Are you sure you want to change the base?

Seqkit #59

Conversation

sarahjeeeze commented Oct 29, 2024 • edited Loading

PR checklist

github-actions bot commented Oct 29, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper left a comment

Choose a reason for hiding this comment

MatthiasZepper Oct 29, 2024

Choose a reason for hiding this comment

sarahjeeeze Oct 30, 2024

Choose a reason for hiding this comment

MatthiasZepper Oct 30, 2024

Choose a reason for hiding this comment

MatthiasZepper Oct 29, 2024

Choose a reason for hiding this comment

sarahjeeeze Oct 30, 2024

Choose a reason for hiding this comment

MatthiasZepper Oct 29, 2024

Choose a reason for hiding this comment

sarahjeeeze Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

sarahjeeeze commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

sarahjeeeze Oct 30, 2024 •

edited

Loading