Enhanced pipeline logic to support user-defined `sample_name` input #24

kylacochrane · 2024-08-07T19:06:11Z

For IRIDA Next Use
This PR enhances the pipeline configuration and logic, allowing users to specify a sample_name for their project samples in IRIDA Next. All output files generated by the pipeline will be named using the user-provided sample_name instead of the IRIDA Next Identifier.

In IRIDA Next, each sample is automatically assigned an identifier by the platform. The changes in this PR ensure that the iridanext.output.json file is generated using this IRIDA Next Identifier (set to meta.irida_id in the pipeline), guaranteeing that output files and metadata are accurately linked to the correct sample on the IRIDA Next platform.

For local use, the pipeline includes logic to set sample_name equal to sample (i.e., meta.id = meta.irida_id), so local users don't need to provide both sample and sample_name in the input samplesheet.

Additional logic is included to rename duplicates and deal with spaces in the sample_name parameter.

This pipeline has been tested on a local instance of IRIDA Next, which has also been updated to support the additional sample_name parameter as detailed here: phac-nml/irida-next#678

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).

github-actions · 2024-08-07T19:07:26Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit f6e6b0f

+| ✅ 134 tests passed       |+
#| ❔  26 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

nextflow_config - Config manifest.version should end in dev: 2.1.1
schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/speciesabundance/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/speciesabundance/main/nextflow_schema.json
nfcore_yml - nf-core version not set in .nf-core.yml

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-speciesabundance_logo_light.png
files_exist - File is ignored: docs/images/nf-core-speciesabundance_logo_light.png
files_exist - File is ignored: docs/images/nf-core-speciesabundance_logo_dark.png
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: lib/Utils.groovy
files_exist - File is ignored: lib/WorkflowMain.groovy
files_exist - File is ignored: lib/NfcoreTemplate.groovy
files_exist - File is ignored: lib/WorkflowSpeciesabundance.groovy
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-speciesabundance_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-speciesabundance_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-speciesabundance_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/speciesabundance/speciesabundance/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-speciesabundance_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.taxonomic_level= S
nextflow_config - Config default value correct: params.kmer_len= 100
nextflow_config - Config default value correct: params.top_n= 5
nextflow_config - Config default value correct: params.max_cpus= 4
nextflow_config - Config default value correct: params.max_memory= 2.GB
nextflow_config - Config default value correct: params.max_time= 1.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
template_strings - Did not find any Jinja template strings (106 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - SAMPLESHEET_CHECK found in conf/modules.config and Nextflow scripts.
modules_config - FASTP_TRIM found in conf/modules.config and Nextflow scripts.
modules_config - KRAKEN2 found in conf/modules.config and Nextflow scripts.
modules_config - BRACKEN found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline

Run details

nf-core/tools version 2.14.1
Run at 2024-09-25 21:25:49

…column based on 'sample' (meta.irida_id) for proper nf-iridanext plugin metadata functionality.

…s not provided in the samplesheet

…ted in the input samplesheet.

apetkau

Thanks so much @kylacochrane . This is amazing work. Looks great in IRIDA Next 😄

I have a few comments for you below. I will go through some more review of this next week.

workflows/speciesabundance.nf

tests/main.nf.test

assets/schema_input.json

tests/main.nf.test

workflows/speciesabundance.nf

emarinier · 2024-08-12T14:44:09Z

workflows/speciesabundance.nf

+            // Ensure ID is unique by appending meta.irida_id if needed
+            while (processedIDs.contains(meta.id)) {
+                meta.id = "${meta.id}_${meta.irida_id}"
+            }


I think this is a useful solution to ensure unique keys, but I wonder if all of this is better serviced by unique keys with strict patterns (the same criteria as the previous IDs) handled by the schema_input.json code?

I had an idea of about a different way of creating unique keys, so as to not rely on the meta.irida_id to append to meta.id.

The idea is, rather than appending meta.irida_id we append the index of the sample in the samplesheet.csv, so that every sample from {1..n} has _1 to _n appended.

My logic is that it might be less confusing to have _n for every sample rather than samples with that long IRIDA-Next ID appended to them. I'd suggest doing it to every sample, so users are not confused as to why it shows up. The code I was thinking would look something like this (my use of closures is amateur at best, so bear with me here).

Channel.fromSamplesheet("input").toList().flatMap { it.withIndex().collect{ entry, idx -> entry[0].id = "${entry[0].id}_${idx}" return [entry[0], entry[1], entry[2]] }}

@emarinier thanks for the comment. The issue with unique keys/strict patterns is that the pipeline will then error if something is passed that is incorrect. Since IRIDA Next is pretty loose on sample names (e.g., allows spaces in names, etc) this could lead to errors in running those samples unless they were renamed

@sgsutcliffe thanks for your suggestion. I know we have already spoken about this for other pipelines/the best way forward.

sgsutcliffe

Looks great! Thanks for taking the time to show me this running on IRIDA Next

sgsutcliffe

Looks good! One last thing to change before merging is to update the documentation (README.md, CHANGELOG.md, docs/usage.md) with the new samplesheet format.

apetkau

Thanks so much @kylacochrane . This is amazing work 😄 . Sorry for taking so long to review it.

modules/local/topN/main.nf

apetkau

Thanks so much for adding this documentation Kyla. This looks really great 😄

I just have one comment in-line.

README.md

apetkau

Looks great. Thanks so much 😄

sgsutcliffe · 2024-09-25T20:25:01Z

CHANGELOG.md

+
+### `Changed`
+
+- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration [PR24](https://github.com/phac-nml/speciesabundance/pull/24)


I added this to change descriptions (left it out originally but it was requested to be added). Basically create sub-bullet points with this information:

sample_name special characters will be replaced with "_"

If no sample_name is supplied in the column sample will be used

To avoid repeat values for sample_name all sample_name values will be suffixed with the unique sample value from the input file

sgsutcliffe · 2024-09-25T20:28:51Z

README.md

+
+`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.
+
+An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.


I would suggest putting this example samplesheet under the Input header than put tests/data/samplename_samplesheet.csv for the IRIDA-Next Optional Input Configuration, as this is the example samplesheet that shows the sample_name column the way you did in your docs/usage.

sgsutcliffe · 2024-09-25T20:31:00Z

docs/usage.md


-An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
+An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline, which includes the `sample_name` column.


Again with the test samplesheet

sgsutcliffe

Just a few minor comments (suggestions). Looks good.

kylacochrane · 2024-09-25T21:07:07Z

Just a few minor comments (suggestions). Looks good.

Thanks for the suggestions Steven! I made the changes: 1af1a8f

Enhanced pipeline logic to support user-defined

d3e2792

kylacochrane added 6 commits August 7, 2024 15:37

Updated processes to ensure CSVTK generates files with a 'sample_id' …

25984f7

…column based on 'sample' (meta.irida_id) for proper nf-iridanext plugin metadata functionality.

Fixed liniting and EC issues

822e93d

Updated CI testing to incorporate sample_name.

03933e8

Update workflow logic to et meta.id to meta.irida_if if sample_name i…

e4a1ae0

…s not provided in the samplesheet

Enhanced test coverage by adding a scenario where sample_name is omit…

94bc34c

…ted in the input samplesheet.

Fixed linting issue

b96f2bc

kylacochrane requested review from apetkau, emarinier and sgsutcliffe August 8, 2024 19:57

apetkau requested changes Aug 9, 2024

View reviewed changes

workflows/speciesabundance.nf Outdated Show resolved Hide resolved

workflows/speciesabundance.nf Outdated Show resolved Hide resolved

tests/main.nf.test Show resolved Hide resolved

emarinier requested changes Aug 12, 2024

View reviewed changes

kylacochrane added 3 commits August 12, 2024 11:38

Fixed typo

3a081bb

Replace non-alphanumeric characters in sample IDs with underscores

cc6a23d

Updated main.nf.test to include tests for various sample_name scenarios

5dd474e

kylacochrane marked this pull request as ready for review August 12, 2024 18:12

sgsutcliffe approved these changes Aug 14, 2024

View reviewed changes

This was referenced Sep 9, 2024

Update: Include sample_name IRIDA-Next input column phac-nml/staramrnf#28

Merged

Update: Include sample_name IRIDA-Next input column phac-nml/snvphylnfc#26

Merged

Update input_schema

f56b9e7

kylacochrane requested review from sgsutcliffe, emarinier and apetkau September 17, 2024 20:11

sgsutcliffe suggested changes Sep 18, 2024

View reviewed changes

emarinier approved these changes Sep 18, 2024

View reviewed changes

This was referenced Sep 18, 2024

Minor Release: 0.2.0 phac-nml/staramrnf#29

Merged

Update: Include sample_name IRIDA-Next input column phac-nml/arboratornf#23

Merged

apetkau requested changes Sep 23, 2024

View reviewed changes

modules/local/topN/main.nf Show resolved Hide resolved

Update documentation

4a457d9

apetkau requested changes Sep 25, 2024

View reviewed changes

README.md Show resolved Hide resolved

Updated README.md

b78388a

apetkau approved these changes Sep 25, 2024

View reviewed changes

kylacochrane requested a review from sgsutcliffe September 25, 2024 19:59

sgsutcliffe reviewed Sep 25, 2024

View reviewed changes

sgsutcliffe approved these changes Sep 25, 2024

View reviewed changes

Updates to documentation for readability

1af1a8f

kylacochrane closed this Sep 25, 2024

kylacochrane reopened this Sep 25, 2024

Edits to README and usage docs

f6e6b0f

kylacochrane merged commit 513b58b into dev Sep 25, 2024
4 checks passed

sgsutcliffe mentioned this pull request Oct 2, 2024

Update: Add sample_name for IRIDA-Next integration phac-nml/gasnomenclature#30

Merged

9 tasks

sgsutcliffe mentioned this pull request Oct 25, 2024

add sample_name as possible column in samplesheet phac-nml/gasclustering#31

Merged

10 tasks

sgsutcliffe mentioned this pull request Nov 6, 2024

Add sample_name column in samplesheet compatibility phac-nml/fetchdatairidanext#19

Merged

10 tasks

sgsutcliffe mentioned this pull request Nov 22, 2024

Minor Release 1.2.0 phac-nml/fetchdatairidanext#20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced pipeline logic to support user-defined `sample_name` input #24

Enhanced pipeline logic to support user-defined `sample_name` input #24

kylacochrane commented Aug 7, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

apetkau left a comment

emarinier Aug 12, 2024

sgsutcliffe Sep 5, 2024 •

edited

Loading

apetkau Sep 23, 2024 •

edited

Loading

sgsutcliffe left a comment

sgsutcliffe left a comment

apetkau left a comment

apetkau left a comment

apetkau left a comment

sgsutcliffe Sep 25, 2024

sgsutcliffe Sep 25, 2024 •

edited

Loading

sgsutcliffe Sep 25, 2024

sgsutcliffe left a comment

kylacochrane commented Sep 25, 2024


		### `Changed`

		- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration [PR24](https://github.com/phac-nml/speciesabundance/pull/24)


		`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

		An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

Enhanced pipeline logic to support user-defined sample_name input #24

Enhanced pipeline logic to support user-defined sample_name input #24

Conversation

kylacochrane commented Aug 7, 2024 • edited Loading

PR checklist

github-actions bot commented Aug 7, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

apetkau left a comment

Choose a reason for hiding this comment

emarinier Aug 12, 2024

Choose a reason for hiding this comment

sgsutcliffe Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

apetkau Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

sgsutcliffe left a comment

Choose a reason for hiding this comment

sgsutcliffe left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

sgsutcliffe Sep 25, 2024

Choose a reason for hiding this comment

sgsutcliffe Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

sgsutcliffe Sep 25, 2024

Choose a reason for hiding this comment

sgsutcliffe left a comment

Choose a reason for hiding this comment

kylacochrane commented Sep 25, 2024

Enhanced pipeline logic to support user-defined `sample_name` input #24

Enhanced pipeline logic to support user-defined `sample_name` input #24

kylacochrane commented Aug 7, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

sgsutcliffe Sep 5, 2024 •

edited

Loading

apetkau Sep 23, 2024 •

edited

Loading

sgsutcliffe Sep 25, 2024 •

edited

Loading