Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: Include sample_name IRIDA-Next input column #26

Merged
merged 19 commits into from
Sep 23, 2024
Merged

Conversation

sgsutcliffe
Copy link
Contributor

@sgsutcliffe sgsutcliffe commented Sep 16, 2024

Modified the template for input samplesheet.csv file to include the sample_name column in addition to sample in-line with changes to IRIDA-Next update as seen with the speciesabundance pipeline and staramrnf. What this means is that the output files and the sample name will be changed to sample_name if a sample_name is called. If snvphylnfc is being locally then the sample_name can be left blank.

Made a few changes:
- sample_name special characters will be replaced with "_"
- If no sample_name is supplied in the column sample will be used
- To avoid repeat values for sample_name all sample_name values will be suffixed with sample
- Tests to check that the variety of different sample_names work with the

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).
  • Tested out in IRIDA-Next locally

@sgsutcliffe sgsutcliffe changed the base branch from main to dev September 16, 2024 15:59
@phac-nml phac-nml deleted a comment from github-actions bot Sep 16, 2024
Copy link

github-actions bot commented Sep 16, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 1419462

+| ✅ 143 tests passed       |+
#| ❔  27 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 2.1.1
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/snvphylnfc/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/phac-nml/snvphylnfc/main/nextflow_schema.json
  • nfcore_yml - nf-core version not set in .nf-core.yml

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-snvphylnfc_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-snvphylnfc_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-snvphylnfc_logo_dark.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: lib/Utils.groovy
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowSnvphylnfc.groovy
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-snvphylnfc_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-snvphylnfc_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-snvphylnfc_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/snvphylnfc/snvphylnfc/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-09-23 21:35:44

@sgsutcliffe
Copy link
Contributor Author

Tested it in IRIDA-Next locally. Looks good!

@sgsutcliffe
Copy link
Contributor Author

An issue that has arisen due to these changes is that when we modify the sample_name to allow it correspond to allowable characters for the pipeline we break the expected outcome of the function select_reference. In addition now it is possible may use the sample column still to select the reference fasta files.

CHANGELOG.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
workflows/snvphylnfc.nf Outdated Show resolved Hide resolved
kylacochrane
kylacochrane previously approved these changes Sep 18, 2024
@kylacochrane kylacochrane self-requested a review September 18, 2024 19:00
@kylacochrane kylacochrane dismissed their stale review September 18, 2024 19:49

Approved the wrong PR

Copy link
Contributor

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one change in the CHANGELOG 😄
Great job Steven!

CHANGELOG.md Outdated
- Modified the template for input csv file to include a `sample_name` column in addition to `sample` in-line with changes to [IRIDA-Next update] as seen with the [speciesabundance pipeline]
- `sample_name` special characters will be replaced with `"_"`
- If no `sample_name` is supplied in the column `sample` will be used
- To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the index of the `input` samplesheet.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking on this - is the plan to use the index or append sample_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is always one place in documentation where I forget to update things to the newest version! Thanks for catching this!

`sample` is a unique identifier, designed to be used internally or in IRIDA-Next, or when `sample_name` is not provided.

`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description was much needed!!!
Just one comment on how it slightly differs from the CHANGELOG.md where index was suggested as the suffix should there be repeat sample_names 😄

@@ -243,7 +261,8 @@ def select_reference(refgenome, reference_sample_id, sample_assemblies) {
log.debug "Selecting reference genome ${reference_genome} from '--refgenome'."
}
else if (reference_sample_id) {
reference_genome = sample_assemblies.filter { it[0] == reference_sample_id && it[1] != null}
// Check each meta category (meta.id, meta.id_alt, meta.irida_id) for a match to params.reference_sample_id
reference_genome = sample_assemblies.filter { (it[0].id == reference_sample_id || it[0].irida_id == reference_sample_id || it[0].id_alt == reference_sample_id) && it[1] != null}
.ifEmpty { error("The provided reference sample ID (${reference_sample_id}) is either missing or has no associated reference assembly.") }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so good!!!

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for implementing this Steven, everyone else for their comments 😄

I ran this in IRIDA Next and it all works for me. No other comments.

Copy link
Contributor

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great Steven!

@sgsutcliffe
Copy link
Contributor Author

Added in a change suggested by @emarinier to remove the necessity of adding in meta.id_alt to the schema.

@sgsutcliffe
Copy link
Contributor Author

Rather than troubleshooting why the change without meta.id_alt was not working I decided to go with the original solution.

@sgsutcliffe sgsutcliffe merged commit d6d8796 into dev Sep 23, 2024
4 checks passed
@sgsutcliffe sgsutcliffe deleted the add-sample-name branch September 23, 2024 21:42
@sgsutcliffe sgsutcliffe mentioned this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants