Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: Include sample_name IRIDA-Next input column #26

Merged
merged 19 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Development

### `Changed`

- Modified the template for input csv file to include a `sample_name` column in addition to `sample` in-line with changes to [IRIDA-Next update] as seen with the [speciesabundance pipeline]
- `sample_name` special characters will be replaced with `"_"`
- If no `sample_name` is supplied in the column `sample` will be used
- To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the unique `sample` value from the input file

[IRIDA-Next update]: https://github.com/phac-nml/irida-next/pull/678
[speciesabundance pipeline]: https://github.com/phac-nml/speciesabundance/pull/24

## [2.1.1] - 2024/08/21

### `Changed`
Expand Down
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,28 @@ This is the [nf-core](https://nf-co.re/)-based pipeline for [SNVPhyl](https://sn

Input is provided to SNVPhyl in the form of a samplesheet (passed as `--input samplesheet.csv`). This samplesheet is a CSV-formated file, which may be provided as a URI (ex: a file path or web address), and has the following format:

| sample | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
| ------- | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| SAMPLE1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| SAMPLE2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| sample | sample_name | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
| ------- | ------------ | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| SAMPLE1 | sample_name1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| SAMPLE2 | sample_name2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |

The columns are defined as follows:

- `sample`: The unique sample identifier to associate with the reads (and optionally the reference assembly).
- `sample`: Mandatory unique sample identifier. The unique sample identifier to associate with the reads (and optionally the reference assembly).
- `sample_name`: Optional, and overrides `sample` for outputs (filenames and sample names) and reference assembly identification.
- `fastq_1`: A URI (ex: a file path or web address) to either single-end FASTQ-formatted reads or one pair of pair-end FASTQ-formatted reads.
- `fastq_2`: (Optional) If `fastq_1` is paired-end, then this field is a URI to reads that are the other pair of reads associated with `fastq_1`.
- `reference_assembly`: (Optional) A URI to a reference assembly associated with the sample, so that it may be referenced on the command line by the sample identifier for use as the reference for the whole pipeline. However, it may be easier to leave these fields blank and specify the reference using the `--refgenome` parameter.
- `metadata_1...8`: (Optional) Permits up to 8 columns for user-defined contextual metadata associated with each `sample`. Refer to [Metadata](#metadata) for more information.

### When to use `sample` vs `sample_name`

Either can be used to identify the reference assembly with the parameter `--reference_sample_id`.

`sample` is a unique identifier, designed to be used internally or in IRIDA-Next, or when `sample_name` is not provided.

`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description was much needed!!!
Just one comment on how it slightly differs from the CHANGELOG.md where index was suggested as the suffix should there be repeat sample_names 😄

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Please see [assets/samplesheet.csv](assets/samplesheet.csv) to see an example of a samplesheet for this pipeline.

# Parameters
Expand All @@ -45,7 +54,7 @@ The optional parameters are as follows:
### Reference

- `--refgenome`: a URI to the reference genome to use during pipeline analysis
- `--reference_sample_id`: the sample identifier of a sample in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis
- `--reference_sample_id`: the sample identifier of a sample (`sample` or `sample_name`) in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis
emarinier marked this conversation as resolved.
Show resolved Hide resolved

Please use only one of `--refgenome` or `--reference_sample_id` and not both.

Expand Down
8 changes: 4 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3
9 changes: 7 additions & 2 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,14 @@
"sample": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["id"],
"meta": ["irida_id"],
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
"errorMessage": "Sample must be provided and cannot contain spaces"
},
"sample_name": {
"type": "string",
"meta": ["id", "id_alt"],
"errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs"
},
"fastq_1": {
"type": "string",
Expand Down
1 change: 1 addition & 0 deletions conf/iridanext.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
iridanext {
enabled = true
output {
idkey = "irida_id"
path = "${params.outdir}/iridanext.output.json.gz"
overwrite = true
files {
Expand Down
2 changes: 1 addition & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ params {
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
input = 'https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv'
input = "${projectDir}/assets/samplesheet.csv"
}
9 changes: 5 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ You will need to create a samplesheet with information about the samples you wou

### Full samplesheet

The input samplesheet can contain the following columns: `sample`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique.
The input samplesheet can contain the following columns: `sample`, `sample_name`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below.

```console
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,,
SAMPLE2,/path/to/sample2_fastq1.fq,,,,,,,,,,
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,,
SAMPLE2,B2,/path/to/sample2_fastq1.fq,,,,,,,,,,
```

| Column | Description |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. Samples should be unique within a samplesheet. |
| `sample_name` | Sample name used in outputs (filenames and sample names) |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | (Optional) Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `reference_assembly` | (Optional) Full path to a FASTA file representing a reference assembly derived from this sample. This field provides a method for selecting a reference genome for the whole pipeline. |
Expand Down
3 changes: 1 addition & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@
"reference_sample_id": {
"type": "string",
"fa_icon": "fas fa-file",
"description": "The sample ID from which to use the associated FASTA-format assembly as a reference.",
"pattern": "^\\S+$"
"description": "The sample ID from which to use the associated FASTA-format assembly as a reference."
},
"refgenome": {
"type": "string",
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet1.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_few-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,,
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,,

8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_little-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,,
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,,
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,,
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,,
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_no-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,
sample,sample_name,fastq_1,fastq_2,reference_assembly
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,
Loading
Loading