Skip to content

Commit

Permalink
Merge pull request #26 from phac-nml/add-sample-name
Browse files Browse the repository at this point in the history
Update: Include sample_name IRIDA-Next input column
  • Loading branch information
sgsutcliffe authored Sep 23, 2024
2 parents c6db4fe + 1419462 commit d6d8796
Show file tree
Hide file tree
Showing 29 changed files with 169 additions and 123 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Development

### `Changed`

- Modified the template for input csv file to include a `sample_name` column in addition to `sample` in-line with changes to [IRIDA-Next update] as seen with the [speciesabundance pipeline]
- `sample_name` special characters will be replaced with `"_"`
- If no `sample_name` is supplied in the column `sample` will be used
- To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the unique `sample` value from the input file

[IRIDA-Next update]: https://github.com/phac-nml/irida-next/pull/678
[speciesabundance pipeline]: https://github.com/phac-nml/speciesabundance/pull/24

## [2.1.1] - 2024/08/21

### `Changed`
Expand Down
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,28 @@ This is the [nf-core](https://nf-co.re/)-based pipeline for [SNVPhyl](https://sn

Input is provided to SNVPhyl in the form of a samplesheet (passed as `--input samplesheet.csv`). This samplesheet is a CSV-formated file, which may be provided as a URI (ex: a file path or web address), and has the following format:

| sample | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
| ------- | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| SAMPLE1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| SAMPLE2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| sample | sample_name | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
| ------- | ------------ | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| SAMPLE1 | sample_name1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |
| SAMPLE2 | sample_name2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 |

The columns are defined as follows:

- `sample`: The unique sample identifier to associate with the reads (and optionally the reference assembly).
- `sample`: Mandatory unique sample identifier. The unique sample identifier to associate with the reads (and optionally the reference assembly).
- `sample_name`: Optional, and overrides `sample` for outputs (filenames and sample names) and reference assembly identification.
- `fastq_1`: A URI (ex: a file path or web address) to either single-end FASTQ-formatted reads or one pair of pair-end FASTQ-formatted reads.
- `fastq_2`: (Optional) If `fastq_1` is paired-end, then this field is a URI to reads that are the other pair of reads associated with `fastq_1`.
- `reference_assembly`: (Optional) A URI to a reference assembly associated with the sample, so that it may be referenced on the command line by the sample identifier for use as the reference for the whole pipeline. However, it may be easier to leave these fields blank and specify the reference using the `--refgenome` parameter.
- `metadata_1...8`: (Optional) Permits up to 8 columns for user-defined contextual metadata associated with each `sample`. Refer to [Metadata](#metadata) for more information.

### When to use `sample` vs `sample_name`

Either can be used to identify the reference assembly with the parameter `--reference_sample_id`.

`sample` is a unique identifier, designed to be used internally or in IRIDA-Next, or when `sample_name` is not provided.

`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Please see [assets/samplesheet.csv](assets/samplesheet.csv) to see an example of a samplesheet for this pipeline.

# Parameters
Expand All @@ -45,7 +54,7 @@ The optional parameters are as follows:
### Reference

- `--refgenome`: a URI to the reference genome to use during pipeline analysis
- `--reference_sample_id`: the sample identifier of a sample in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis
- `--reference_sample_id`: the sample identifier of a sample (`sample` or `sample_name`) in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis

Please use only one of `--refgenome` or `--reference_sample_id` and not both.

Expand Down
8 changes: 4 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3
9 changes: 7 additions & 2 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,14 @@
"sample": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["id"],
"meta": ["irida_id"],
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
"errorMessage": "Sample must be provided and cannot contain spaces"
},
"sample_name": {
"type": "string",
"meta": ["id", "id_alt"],
"errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs"
},
"fastq_1": {
"type": "string",
Expand Down
1 change: 1 addition & 0 deletions conf/iridanext.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
iridanext {
enabled = true
output {
idkey = "irida_id"
path = "${params.outdir}/iridanext.output.json.gz"
overwrite = true
files {
Expand Down
2 changes: 1 addition & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ params {
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
input = 'https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv'
input = "${projectDir}/assets/samplesheet.csv"
}
9 changes: 5 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ You will need to create a samplesheet with information about the samples you wou

### Full samplesheet

The input samplesheet can contain the following columns: `sample`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique.
The input samplesheet can contain the following columns: `sample`, `sample_name`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below.

```console
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,,
SAMPLE2,/path/to/sample2_fastq1.fq,,,,,,,,,,
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,,
SAMPLE2,B2,/path/to/sample2_fastq1.fq,,,,,,,,,,
```

| Column | Description |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. Samples should be unique within a samplesheet. |
| `sample_name` | Sample name used in outputs (filenames and sample names) |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | (Optional) Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `reference_assembly` | (Optional) Full path to a FASTA file representing a reference assembly derived from this sample. This field provides a method for selecting a reference genome for the whole pipeline. |
Expand Down
3 changes: 1 addition & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@
"reference_sample_id": {
"type": "string",
"fa_icon": "fas fa-file",
"description": "The sample ID from which to use the associated FASTA-format assembly as a reference.",
"pattern": "^\\S+$"
"description": "The sample ID from which to use the associated FASTA-format assembly as a reference."
},
"refgenome": {
"type": "string",
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet1.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_few-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,,
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,,

8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_little-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,,
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,,
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8
sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,,
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,,
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8
8 changes: 4 additions & 4 deletions tests/data/samplesheets/samplesheet_no-metadata.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2,reference_assembly
SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta
SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,
SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,
sample,sample_name,fastq_1,fastq_2,reference_assembly
SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,
SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,
Loading

0 comments on commit d6d8796

Please sign in to comment.