diff --git a/CHANGELOG.md b/CHANGELOG.md index f56ee7d..f697c3d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,18 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## Development + +### `Changed` + +- Modified the template for input csv file to include a `sample_name` column in addition to `sample` in-line with changes to [IRIDA-Next update] as seen with the [speciesabundance pipeline] + - `sample_name` special characters will be replaced with `"_"` + - If no `sample_name` is supplied in the column `sample` will be used + - To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the unique `sample` value from the input file + +[IRIDA-Next update]: https://github.com/phac-nml/irida-next/pull/678 +[speciesabundance pipeline]: https://github.com/phac-nml/speciesabundance/pull/24 + ## [2.1.1] - 2024/08/21 ### `Changed` diff --git a/README.md b/README.md index 2ec0dce..ddf8eb6 100644 --- a/README.md +++ b/README.md @@ -8,19 +8,28 @@ This is the [nf-core](https://nf-co.re/)-based pipeline for [SNVPhyl](https://sn Input is provided to SNVPhyl in the form of a samplesheet (passed as `--input samplesheet.csv`). This samplesheet is a CSV-formated file, which may be provided as a URI (ex: a file path or web address), and has the following format: -| sample | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 | -| ------- | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | -| SAMPLE1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 | -| SAMPLE2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 | +| sample | sample_name | fastq_1 | fastq_2 | reference_assembly | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 | +| ------- | ------------ | -------------------------- | -------------------------- | ---------------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | +| SAMPLE1 | sample_name1 | /path/to/sample1_fastq1.fq | /path/to/sample1_fastq2.fq | /path/to/sample1_assembly.fa | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 | +| SAMPLE2 | sample_name2 | /path/to/sample2_fastq1.fq | | | meta1 | meta2 | meta3 | meta4 | meta5 | meta6 | meta7 | meta8 | The columns are defined as follows: -- `sample`: The unique sample identifier to associate with the reads (and optionally the reference assembly). +- `sample`: Mandatory unique sample identifier. The unique sample identifier to associate with the reads (and optionally the reference assembly). +- `sample_name`: Optional, and overrides `sample` for outputs (filenames and sample names) and reference assembly identification. - `fastq_1`: A URI (ex: a file path or web address) to either single-end FASTQ-formatted reads or one pair of pair-end FASTQ-formatted reads. - `fastq_2`: (Optional) If `fastq_1` is paired-end, then this field is a URI to reads that are the other pair of reads associated with `fastq_1`. - `reference_assembly`: (Optional) A URI to a reference assembly associated with the sample, so that it may be referenced on the command line by the sample identifier for use as the reference for the whole pipeline. However, it may be easier to leave these fields blank and specify the reference using the `--refgenome` parameter. - `metadata_1...8`: (Optional) Permits up to 8 columns for user-defined contextual metadata associated with each `sample`. Refer to [Metadata](#metadata) for more information. +### When to use `sample` vs `sample_name` + +Either can be used to identify the reference assembly with the parameter `--reference_sample_id`. + +`sample` is a unique identifier, designed to be used internally or in IRIDA-Next, or when `sample_name` is not provided. + +`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`. + The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Please see [assets/samplesheet.csv](assets/samplesheet.csv) to see an example of a samplesheet for this pipeline. # Parameters @@ -45,7 +54,7 @@ The optional parameters are as follows: ### Reference - `--refgenome`: a URI to the reference genome to use during pipeline analysis -- `--reference_sample_id`: the sample identifier of a sample in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis +- `--reference_sample_id`: the sample identifier of a sample (`sample` or `sample_name`) in the samplesheet that contains a provided `reference_assembly` to use as a reference genome during pipeline analysis Please use only one of `--refgenome` or `--reference_sample_id` and not both. diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv index 8f3eb49..9a3c2b7 100644 --- a/assets/samplesheet.csv +++ b/assets/samplesheet.csv @@ -1,4 +1,4 @@ -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1 -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2 -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3 +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1 +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,1.1,2.2,3.2,4.2,5.2,6.2,7.1,8.2 +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.2,2.2,3.3,4.3,5.3,6.2,,8.3 diff --git a/assets/schema_input.json b/assets/schema_input.json index e6f6d73..714cda3 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -10,9 +10,14 @@ "sample": { "type": "string", "pattern": "^\\S+$", - "meta": ["id"], + "meta": ["irida_id"], "unique": true, - "errorMessage": "Sample name must be provided and cannot contain spaces" + "errorMessage": "Sample must be provided and cannot contain spaces" + }, + "sample_name": { + "type": "string", + "meta": ["id", "id_alt"], + "errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs" }, "fastq_1": { "type": "string", diff --git a/conf/iridanext.config b/conf/iridanext.config index 31e99ae..1a42cd3 100644 --- a/conf/iridanext.config +++ b/conf/iridanext.config @@ -1,6 +1,7 @@ iridanext { enabled = true output { + idkey = "irida_id" path = "${params.outdir}/iridanext.output.json.gz" overwrite = true files { diff --git a/conf/test_full.config b/conf/test_full.config index bfccaaa..a3bf02e 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -15,5 +15,5 @@ params { config_profile_description = 'Full test dataset to check pipeline function' // Input data for full size test - input = 'https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv' + input = "${projectDir}/assets/samplesheet.csv" } diff --git a/docs/usage.md b/docs/usage.md index 3844be4..a82fa81 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -14,19 +14,20 @@ You will need to create a samplesheet with information about the samples you wou ### Full samplesheet -The input samplesheet can contain the following columns: `sample`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique. +The input samplesheet can contain the following columns: `sample`, `sample_name`, `fastq_1`, `fastq_2`, `reference_assembly`, and `metadata_1` - `metadata_8`. The sample IDs within a samplesheet should be unique. A final samplesheet file consisting of both single- and paired-end data may look something like the one below. ```console -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 -SAMPLE1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,, -SAMPLE2,/path/to/sample2_fastq1.fq,,,,,,,,,, +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 +SAMPLE1,A1,/path/to/sample1_fastq1.fq,/path/to/sample1_fastq2.fq,/path/to/sample1_assembly.fa,,,,,,,, +SAMPLE2,B2,/path/to/sample2_fastq1.fq,,,,,,,,,, ``` | Column | Description | | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `sample` | Custom sample name. Samples should be unique within a samplesheet. | +| `sample_name` | Sample name used in outputs (filenames and sample names) | | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | | `fastq_2` | (Optional) Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | | `reference_assembly` | (Optional) Full path to a FASTA file representing a reference assembly derived from this sample. This field provides a method for selecting a reference genome for the whole pipeline. | diff --git a/nextflow_schema.json b/nextflow_schema.json index 798e7c9..00e4485 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -47,8 +47,7 @@ "reference_sample_id": { "type": "string", "fa_icon": "fas fa-file", - "description": "The sample ID from which to use the associated FASTA-format assembly as a reference.", - "pattern": "^\\S+$" + "description": "The sample ID from which to use the associated FASTA-format assembly as a reference." }, "refgenome": { "type": "string", diff --git a/tests/data/SAMPLE1.json.gz b/tests/data/A_1_.json.gz similarity index 100% rename from tests/data/SAMPLE1.json.gz rename to tests/data/A_1_.json.gz diff --git a/tests/data/SAMPLE1.simple.json.gz b/tests/data/A_1_.simple.json.gz similarity index 100% rename from tests/data/SAMPLE1.simple.json.gz rename to tests/data/A_1_.simple.json.gz diff --git a/tests/data/SAMPLE1_1.fastq b/tests/data/A_1__1.fastq similarity index 100% rename from tests/data/SAMPLE1_1.fastq rename to tests/data/A_1__1.fastq diff --git a/tests/data/SAMPLE1_2.fastq b/tests/data/A_1__2.fastq similarity index 100% rename from tests/data/SAMPLE1_2.fastq rename to tests/data/A_1__2.fastq diff --git a/tests/data/SAMPLE1_sorted.bam b/tests/data/A_1__sorted.bam similarity index 100% rename from tests/data/SAMPLE1_sorted.bam rename to tests/data/A_1__sorted.bam diff --git a/tests/data/SAMPLE2.simple.json.gz b/tests/data/B2.simple.json.gz similarity index 100% rename from tests/data/SAMPLE2.simple.json.gz rename to tests/data/B2.simple.json.gz diff --git a/tests/data/SAMPLE3.simple.json.gz b/tests/data/B2_SAMPLE3.simple.json.gz similarity index 100% rename from tests/data/SAMPLE3.simple.json.gz rename to tests/data/B2_SAMPLE3.simple.json.gz diff --git a/tests/data/SAMPLE3_sorted.bam b/tests/data/B2_SAMPLE3_sorted.bam similarity index 100% rename from tests/data/SAMPLE3_sorted.bam rename to tests/data/B2_SAMPLE3_sorted.bam diff --git a/tests/data/SAMPLE2_sorted.bam b/tests/data/B2_sorted.bam similarity index 100% rename from tests/data/SAMPLE2_sorted.bam rename to tests/data/B2_sorted.bam diff --git a/tests/data/samplesheets/samplesheet1.csv b/tests/data/samplesheets/samplesheet1.csv index 946a3df..b29d0c6 100644 --- a/tests/data/samplesheets/samplesheet1.csv +++ b/tests/data/samplesheets/samplesheet1.csv @@ -1,4 +1,4 @@ -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8 -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8 -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8 +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8 +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8 +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8 diff --git a/tests/data/samplesheets/samplesheet_few-metadata.csv b/tests/data/samplesheets/samplesheet_few-metadata.csv index fddabb7..65ffe53 100644 --- a/tests/data/samplesheets/samplesheet_few-metadata.csv +++ b/tests/data/samplesheets/samplesheet_few-metadata.csv @@ -1,5 +1,5 @@ -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3 -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1 -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2 -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,, +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3 +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,1.1,2.1,3.1 +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,2.2,3.2 +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,1.3,2.3,, diff --git a/tests/data/samplesheets/samplesheet_little-metadata.csv b/tests/data/samplesheets/samplesheet_little-metadata.csv index fa120bd..ef2471d 100644 --- a/tests/data/samplesheets/samplesheet_little-metadata.csv +++ b/tests/data/samplesheets/samplesheet_little-metadata.csv @@ -1,4 +1,4 @@ -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,, -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,, -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8 +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,,,,1.4,,,, +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,,,, +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,3.1,3.2,,,,,,3.8 diff --git a/tests/data/samplesheets/samplesheet_no-metadata.csv b/tests/data/samplesheets/samplesheet_no-metadata.csv index 26de889..ca29eae 100644 --- a/tests/data/samplesheets/samplesheet_no-metadata.csv +++ b/tests/data/samplesheets/samplesheet_no-metadata.csv @@ -1,4 +1,4 @@ -sample,fastq_1,fastq_2,reference_assembly -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,, -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,, +sample,sample_name,fastq_1,fastq_2,reference_assembly +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,, +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,, diff --git a/tests/data/samplesheets/samplesheet_tab-metadata.csv b/tests/data/samplesheets/samplesheet_tab-metadata.csv index cf0094a..193fafc 100644 --- a/tests/data/samplesheets/samplesheet_tab-metadata.csv +++ b/tests/data/samplesheets/samplesheet_tab-metadata.csv @@ -1,5 +1,5 @@ -sample,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 -SAMPLE1,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,a b,,,,,,, -SAMPLE2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,a b,,, -SAMPLE3,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,,,,,,,,a b +sample,sample_name,fastq_1,fastq_2,reference_assembly,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8 +SAMPLE1,A 1#,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/a_2.fastq,https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta,a b,,,,,,, +SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/b_2.fastq,,,,,,a b,,, +SAMPLE3,B2,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_1.fastq,https://raw.githubusercontent.com/phac-nml/snvphyl-galaxy-cli/development/example-data/fastqs/c_2.fastq,,,,,,,,,a b diff --git a/tests/data/snvAlignment.phy b/tests/data/snvAlignment.phy index 8711194..7515971 100644 --- a/tests/data/snvAlignment.phy +++ b/tests/data/snvAlignment.phy @@ -1,6 +1,6 @@ 4 2 -SAMPLE1 AA -SAMPLE2 TA -SAMPLE3 AC +A_1_ AA +B2 TA +B2_SAMPLE3 AC reference AA diff --git a/tests/modules/local/freebayes/main.nf.test b/tests/modules/local/freebayes/main.nf.test index c10fb04..510cdd6 100644 --- a/tests/modules/local/freebayes/main.nf.test +++ b/tests/modules/local/freebayes/main.nf.test @@ -4,7 +4,7 @@ nextflow_process { script "modules/local/freebayes/main.nf" process "FREEBAYES" - test("SAMPLE1") { + test("A_1_") { when { params { @@ -13,7 +13,7 @@ nextflow_process { } process { """ - input[0] = new Tuple(["id": "SAMPLE1"], file("$baseDir/tests/data/SAMPLE1_sorted.bam")) + input[0] = new Tuple(["id": "A_1_"], file("$baseDir/tests/data/A_1__sorted.bam")) input[1] = params.refgenome """ } @@ -39,7 +39,7 @@ nextflow_process { } - test("SAMPLE2") { + test("B2") { when { params { @@ -48,7 +48,7 @@ nextflow_process { } process { """ - input[0] = new Tuple(["id": "SAMPLE2"], file("$baseDir/tests/data/SAMPLE2_sorted.bam")) + input[0] = new Tuple(["id": "B2"], file("$baseDir/tests/data/B2_sorted.bam")) input[1] = params.refgenome """ } @@ -75,7 +75,7 @@ nextflow_process { } - test("SAMPLE3") { + test("B2_SAMPLE3") { when { params { @@ -84,7 +84,7 @@ nextflow_process { } process { """ - input[0] = new Tuple(["id": "SAMPLE3"], file("$baseDir/tests/data/SAMPLE3_sorted.bam")) + input[0] = new Tuple(["id": "B2_SAMPLE3"], file("$baseDir/tests/data/B2_SAMPLE3_sorted.bam")) input[1] = params.refgenome """ } diff --git a/tests/modules/local/phyml/main.nf.test b/tests/modules/local/phyml/main.nf.test index fb344c1..f8b2b9c 100644 --- a/tests/modules/local/phyml/main.nf.test +++ b/tests/modules/local/phyml/main.nf.test @@ -31,9 +31,9 @@ nextflow_process { lines = path(phylogeneticTree[0]).readLines() - assert lines.join("\n").contains("SAMPLE1") - assert lines.join("\n").contains("SAMPLE2") - assert lines.join("\n").contains("SAMPLE3") + assert lines.join("\n").contains("A_1_") + assert lines.join("\n").contains("B2") + assert lines.join("\n").contains("B2_SAMPLE3") } } diff --git a/tests/modules/local/smaltmap/main.nf.test b/tests/modules/local/smaltmap/main.nf.test index c557e23..bc92d63 100644 --- a/tests/modules/local/smaltmap/main.nf.test +++ b/tests/modules/local/smaltmap/main.nf.test @@ -13,7 +13,7 @@ nextflow_process { } process { """ - input[0] = new Tuple(["id": "SAMPLE1"], [file("$baseDir/tests/data/SAMPLE1_1.fastq"), file("$baseDir/tests/data/SAMPLE1_2.fastq")]) + input[0] = new Tuple(["id": "A_1_"], [file("$baseDir/tests/data/A_1__1.fastq"), file("$baseDir/tests/data/A_1__2.fastq")]) input[1] = file("$baseDir/tests/data/reference.fasta.fai") input[2] = file("$baseDir/tests/data/reference.sma") input[3] = file("$baseDir/tests/data/reference.smi") @@ -25,7 +25,7 @@ nextflow_process { assert process.success with(process.out) { - assert bams[0][0].id == "SAMPLE1" + assert bams[0][0].id == "A_1_" assert path(bams[0][1]).exists() } } diff --git a/tests/pipelines/main.nf.test b/tests/pipelines/main.nf.test index aee669a..87d336f 100644 --- a/tests/pipelines/main.nf.test +++ b/tests/pipelines/main.nf.test @@ -37,9 +37,9 @@ nextflow_pipeline { assert lines.contains("Number of sites filtered: 0") // check filtered density files: - assert path("$launchDir/results/consolidate/SAMPLE1_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE2_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE3_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/A_1__filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_SAMPLE3_filtered_density.txt").exists() lines = path("$launchDir/results/cat/cat_invalid_positions.txt").readLines() assert lines.contains("#Calculation and writing of high density regions has completed.") @@ -55,31 +55,31 @@ nextflow_pipeline { lines = path("$launchDir/results/phyml/phylogeneticTree.newick").readLines() - assert lines.join("\n").contains("SAMPLE1") - assert lines.join("\n").contains("SAMPLE2") - assert lines.join("\n").contains("SAMPLE3") + assert lines.join("\n").contains("A_1_") + assert lines.join("\n").contains("B2") + assert lines.join("\n").contains("B2_SAMPLE3") // check WRITE_METADATA file lines = path("$launchDir/results/write/metadata.tsv").readLines() assert lines.size() == 4 assert lines.contains("id\tmyheader_1\tmyheader_2\tmyheader_3\tmyheader_4\tmyheader_5\tmyheader_6\tmyheader_7\tmyheader_8") - assert lines.contains("SAMPLE1\t1.1\t1.2\t1.3\t1.4\t1.5\t1.6\t1.7\t1.8") - assert lines.contains("SAMPLE2\t2.1\t2.2\t2.3\t2.4\t2.5\t2.6\t2.7\t2.8") - assert lines.contains("SAMPLE3\t3.1\t3.2\t3.3\t3.4\t3.5\t3.6\t3.7\t3.8") + assert lines.contains("A_1_\t1.1\t1.2\t1.3\t1.4\t1.5\t1.6\t1.7\t1.8") + assert lines.contains("B2\t2.1\t2.2\t2.3\t2.4\t2.5\t2.6\t2.7\t2.8") + assert lines.contains("B2_SAMPLE3\t3.1\t3.2\t3.3\t3.4\t3.5\t3.6\t3.7\t3.8") // Check that ArborView output is created def actual_arborview = path("$launchDir/results/arbor/SNVPhyl_ArborView.html") assert actual_arborview.exists() - assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nSAMPLE1\\t1.1\\t1.2\\t1.3\\t1.4\\t1.5\\t1.6\\t1.7\\t1.8\\nSAMPLE2\\t2.1\\t2.2\\t2.3\\t2.4\\t2.5\\t2.6\\t2.7\\t2.8\\nSAMPLE3\\t3.1\\t3.2\\t3.3\\t3.4\\t3.5\\t3.6\\t3.7\\t3.8") + assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nA_1_\\t1.1\\t1.2\\t1.3\\t1.4\\t1.5\\t1.6\\t1.7\\t1.8\\nB2\\t2.1\\t2.2\\t2.3\\t2.4\\t2.5\\t2.6\\t2.7\\t2.8\\nB2_SAMPLE3\\t3.1\\t3.2\\t3.3\\t3.4\\t3.5\\t3.6\\t3.7\\t3.8") // check MAKE_SNV output lines = path("$launchDir/results/make/snvMatrix.tsv").readLines() - assert lines[0] == "strain\tSAMPLE2\tSAMPLE3\tSAMPLE1\treference\t" - assert lines.contains("SAMPLE1\t1\t1\t0\t0\t") - assert lines.contains("SAMPLE2\t0\t2\t1\t1\t") - assert lines.contains("SAMPLE3\t2\t0\t1\t1\t") + assert lines[0] == "strain\tB2\tB2_SAMPLE3\tA_1_\treference\t" + assert lines.contains("A_1_\t1\t1\t0\t0\t") + assert lines.contains("B2\t0\t2\t1\t1\t") + assert lines.contains("B2_SAMPLE3\t2\t0\t1\t1\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -127,14 +127,14 @@ nextflow_pipeline { assert lines.size() == 4 assert lines.contains("id\tmyheader_1\tmyheader_2\tmyheader_3\tmyheader_4\tmyheader_5\tmyheader_6\tmyheader_7\tmyheader_8") - assert lines.contains("SAMPLE1\t\t\t\t\t\t\t\t") - assert lines.contains("SAMPLE2\t\t\t\t\t\t\t\t") - assert lines.contains("SAMPLE3\t\t\t\t\t\t\t\t") + assert lines.contains("A_1_\t\t\t\t\t\t\t\t") + assert lines.contains("B2\t\t\t\t\t\t\t\t") + assert lines.contains("B2_SAMPLE3\t\t\t\t\t\t\t\t") // Check that ArborView output is created def actual_arborview = path("$launchDir/results/arbor/SNVPhyl_ArborView.html") assert actual_arborview.exists() - assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nSAMPLE1\\t\\t\\t\\t\\t\\t\\t\\t\\nSAMPLE2\\t\\t\\t\\t\\t\\t\\t\\t\\nSAMPLE3\\t\\t\\t\\t\\t\\t\\t\\t") + assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nA_1_\\t\\t\\t\\t\\t\\t\\t\\t\\nB2\\t\\t\\t\\t\\t\\t\\t\\t\\nB2_SAMPLE3\\t\\t\\t\\t\\t\\t\\t\\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -182,14 +182,14 @@ nextflow_pipeline { assert lines.size() == 4 assert lines.contains("id\tmyheader_1\tmyheader_2\tmyheader_3\tmyheader_4\tmyheader_5\tmyheader_6\tmyheader_7\tmyheader_8") - assert lines.contains("SAMPLE1\t\t\t\t1.4\t\t\t\t") - assert lines.contains("SAMPLE2\t\t\t\t\t\t\t\t") - assert lines.contains("SAMPLE3\t3.1\t3.2\t\t\t\t\t\t3.8") + assert lines.contains("A_1_\t\t\t\t1.4\t\t\t\t") + assert lines.contains("B2\t\t\t\t\t\t\t\t") + assert lines.contains("B2_SAMPLE3\t3.1\t3.2\t\t\t\t\t\t3.8") // Check that ArborView output is created def actual_arborview = path("$launchDir/results/arbor/SNVPhyl_ArborView.html") assert actual_arborview.exists() - assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nSAMPLE1\\t\\t\\t\\t1.4\\t\\t\\t\\t\\nSAMPLE2\\t\\t\\t\\t\\t\\t\\t\\t\\nSAMPLE3\\t3.1\\t3.2\\t\\t\\t\\t\\t\\t3.8") + assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmyheader_4\\tmyheader_5\\tmyheader_6\\tmyheader_7\\tmyheader_8\\nA_1_\\t\\t\\t\\t1.4\\t\\t\\t\\t\\nB2\\t\\t\\t\\t\\t\\t\\t\\t\\nB2_SAMPLE3\\t3.1\\t3.2\\t\\t\\t\\t\\t\\t3.8") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -254,14 +254,14 @@ nextflow_pipeline { assert lines.size() == 4 assert lines.contains("id\tmetadata_1\tmetadata_2\tmetadata_3\tmetadata_4\tmetadata_5\tmetadata_6\tmetadata_7\tmetadata_8") - assert lines.contains("SAMPLE1\t\t\t\t\t\t\t\t") - assert lines.contains("SAMPLE2\t\t\t\t\t\t\t\t") - assert lines.contains("SAMPLE3\t\t\t\t\t\t\t\t") + assert lines.contains("A_1_\t\t\t\t\t\t\t\t") + assert lines.contains("B2\t\t\t\t\t\t\t\t") + assert lines.contains("B2_SAMPLE3\t\t\t\t\t\t\t\t") // Check that ArborView output is created def actual_arborview = path("$launchDir/results/arbor/SNVPhyl_ArborView.html") assert actual_arborview.exists() - assert actual_arborview.text.contains("id\\tmetadata_1\\tmetadata_2\\tmetadata_3\\tmetadata_4\\tmetadata_5\\tmetadata_6\\tmetadata_7\\tmetadata_8\\nSAMPLE1\\t\\t\\t\\t\\t\\t\\t\\t\\nSAMPLE2\\t\\t\\t\\t\\t\\t\\t\\t\\nSAMPLE3\\t\\t\\t\\t\\t\\t\\t\\t") + assert actual_arborview.text.contains("id\\tmetadata_1\\tmetadata_2\\tmetadata_3\\tmetadata_4\\tmetadata_5\\tmetadata_6\\tmetadata_7\\tmetadata_8\\nA_1_\\t\\t\\t\\t\\t\\t\\t\\t\\nB2\\t\\t\\t\\t\\t\\t\\t\\t\\nB2_SAMPLE3\\t\\t\\t\\t\\t\\t\\t\\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -304,14 +304,14 @@ nextflow_pipeline { assert lines.size() == 4 assert lines.contains("id\tmyheader_1\tmyheader_2\tmyheader_3\tmetadata_4\tmetadata_5\tmetadata_6\tmetadata_7\tmetadata_8") - assert lines.contains("SAMPLE1\t1.1\t2.1\t3.1\t\t\t\t\t") - assert lines.contains("SAMPLE2\t\t2.2\t3.2\t\t\t\t\t") - assert lines.contains("SAMPLE3\t1.3\t2.3\t\t\t\t\t\t") + assert lines.contains("A_1_\t1.1\t2.1\t3.1\t\t\t\t\t") + assert lines.contains("B2\t\t2.2\t3.2\t\t\t\t\t") + assert lines.contains("B2_SAMPLE3\t1.3\t2.3\t\t\t\t\t\t") // Check that ArborView output is created def actual_arborview = path("$launchDir/results/arbor/SNVPhyl_ArborView.html") assert actual_arborview.exists() - assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmetadata_4\\tmetadata_5\\tmetadata_6\\tmetadata_7\\tmetadata_8\\nSAMPLE1\\t1.1\\t2.1\\t3.1\\t\\t\\t\\t\\t\\nSAMPLE2\\t\\t2.2\\t3.2\\t\\t\\t\\t\\t\\nSAMPLE3\\t1.3\\t2.3\\t\\t\\t\\t\\t\\t") + assert actual_arborview.text.contains("id\\tmyheader_1\\tmyheader_2\\tmyheader_3\\tmetadata_4\\tmetadata_5\\tmetadata_6\\tmetadata_7\\tmetadata_8\\nA_1_\\t1.1\\t2.1\\t3.1\\t\\t\\t\\t\\t\\nB2\\t\\t2.2\\t3.2\\t\\t\\t\\t\\t\\nB2_SAMPLE3\\t1.3\\t2.3\\t\\t\\t\\t\\t\\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") diff --git a/tests/workflows/snvphylnfc.nf.test b/tests/workflows/snvphylnfc.nf.test index aa69a3a..37e6006 100644 --- a/tests/workflows/snvphylnfc.nf.test +++ b/tests/workflows/snvphylnfc.nf.test @@ -8,7 +8,7 @@ nextflow_workflow { when { params { - input = "https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv" + input = "$baseDir/assets/samplesheet.csv" refgenome = "https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/reference.fasta" outdir = "results" } @@ -29,9 +29,9 @@ nextflow_workflow { assert lines.contains("Number of sites filtered: 0") // check filtered density files: - assert path("$launchDir/results/consolidate/SAMPLE1_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE2_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE3_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/A_1__filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_SAMPLE3_filtered_density.txt").exists() lines = path("$launchDir/results/cat/cat_invalid_positions.txt").readLines() assert lines.contains("#Calculation and writing of high density regions has completed.") @@ -47,17 +47,17 @@ nextflow_workflow { lines = path("$launchDir/results/phyml/phylogeneticTree.newick").readLines() - assert lines.join("\n").contains("SAMPLE1") - assert lines.join("\n").contains("SAMPLE2") - assert lines.join("\n").contains("SAMPLE3") + assert lines.join("\n").contains("A_1_") + assert lines.join("\n").contains("B2") + assert lines.join("\n").contains("B2_SAMPLE3") // check MAKE_SNV output lines = path("$launchDir/results/make/snvMatrix.tsv").readLines() - assert lines[0] == "strain\tSAMPLE2\tSAMPLE3\tSAMPLE1\treference\t" - assert lines.contains("SAMPLE1\t1\t1\t0\t0\t") - assert lines.contains("SAMPLE2\t0\t2\t1\t1\t") - assert lines.contains("SAMPLE3\t2\t0\t1\t1\t") + assert lines[0] == "strain\tB2\tB2_SAMPLE3\tA_1_\treference\t" + assert lines.contains("A_1_\t1\t1\t0\t0\t") + assert lines.contains("B2\t0\t2\t1\t1\t") + assert lines.contains("B2_SAMPLE3\t2\t0\t1\t1\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -78,8 +78,8 @@ nextflow_workflow { when { params { - input = "https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv" - reference_sample_id = "SAMPLE1" + input = "$baseDir/assets/samplesheet.csv" + reference_sample_id = "A_1_" outdir = "results" } workflow {} @@ -99,9 +99,9 @@ nextflow_workflow { assert lines.contains("Number of sites filtered: 0") // check filtered density files: - assert path("$launchDir/results/consolidate/SAMPLE1_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE2_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE3_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/A_1__filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_SAMPLE3_filtered_density.txt").exists() lines = path("$launchDir/results/cat/cat_invalid_positions.txt").readLines() assert lines.contains("#Calculation and writing of high density regions has completed.") @@ -117,17 +117,17 @@ nextflow_workflow { lines = path("$launchDir/results/phyml/phylogeneticTree.newick").readLines() - assert lines.join("\n").contains("SAMPLE1") - assert lines.join("\n").contains("SAMPLE2") - assert lines.join("\n").contains("SAMPLE3") + assert lines.join("\n").contains("A_1_") + assert lines.join("\n").contains("B2") + assert lines.join("\n").contains("B2_SAMPLE3") // check MAKE_SNV output lines = path("$launchDir/results/make/snvMatrix.tsv").readLines() - assert lines[0] == "strain\tSAMPLE2\tSAMPLE3\tSAMPLE1\treference\t" - assert lines.contains("SAMPLE1\t1\t1\t0\t0\t") - assert lines.contains("SAMPLE2\t0\t2\t1\t1\t") - assert lines.contains("SAMPLE3\t2\t0\t1\t1\t") + assert lines[0] == "strain\tB2\tB2_SAMPLE3\tA_1_\treference\t" + assert lines.contains("A_1_\t1\t1\t0\t0\t") + assert lines.contains("B2\t0\t2\t1\t1\t") + assert lines.contains("B2_SAMPLE3\t2\t0\t1\t1\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") @@ -180,9 +180,9 @@ nextflow_workflow { assert path("$launchDir/results").exists() // check that no "_filtered_density.txt" files exist: - assert path("$launchDir/results/consolidate/SAMPLE1_filtered_density.txt").exists() == false - assert path("$launchDir/results/consolidate/SAMPLE2_filtered_density.txt").exists() == false - assert path("$launchDir/results/consolidate/SAMPLE3_filtered_density.txt").exists() == false + assert path("$launchDir/results/consolidate/A_1__filtered_density.txt").exists() == false + assert path("$launchDir/results/consolidate/B2_filtered_density.txt").exists() == false + assert path("$launchDir/results/consolidate/B2_SAMPLE3_filtered_density.txt").exists() == false lines = path("$launchDir/results/cat/cat_invalid_positions.bed").readLines() assert lines.contains("#Calculation and writing of high density regions has completed.") == false @@ -194,7 +194,7 @@ nextflow_workflow { when { params { - input = "https://raw.githubusercontent.com/phac-nml/snvphylnfc/dev/assets/samplesheet.csv" + input = "$baseDir/assets/samplesheet.csv" refgenome = "https://github.com/phac-nml/snvphylnfc/raw/dev/tests/data/reference.fasta.gz" outdir = "results" } @@ -215,9 +215,9 @@ nextflow_workflow { assert lines.contains("Number of sites filtered: 0") // check filtered density files: - assert path("$launchDir/results/consolidate/SAMPLE1_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE2_filtered_density.txt").exists() - assert path("$launchDir/results/consolidate/SAMPLE3_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/A_1__filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_filtered_density.txt").exists() + assert path("$launchDir/results/consolidate/B2_SAMPLE3_filtered_density.txt").exists() lines = path("$launchDir/results/cat/cat_invalid_positions.txt").readLines() assert lines.contains("#Calculation and writing of high density regions has completed.") @@ -233,17 +233,17 @@ nextflow_workflow { lines = path("$launchDir/results/phyml/phylogeneticTree.newick").readLines() - assert lines.join("\n").contains("SAMPLE1") - assert lines.join("\n").contains("SAMPLE2") - assert lines.join("\n").contains("SAMPLE3") + assert lines.join("\n").contains("A_1_") + assert lines.join("\n").contains("B2") + assert lines.join("\n").contains("B2_SAMPLE3") // check MAKE_SNV output lines = path("$launchDir/results/make/snvMatrix.tsv").readLines() - assert lines[0] == "strain\tSAMPLE2\tSAMPLE3\tSAMPLE1\treference\t" - assert lines.contains("SAMPLE1\t1\t1\t0\t0\t") - assert lines.contains("SAMPLE2\t0\t2\t1\t1\t") - assert lines.contains("SAMPLE3\t2\t0\t1\t1\t") + assert lines[0] == "strain\tB2\tB2_SAMPLE3\tA_1_\treference\t" + assert lines.contains("A_1_\t1\t1\t0\t0\t") + assert lines.contains("B2\t0\t2\t1\t1\t") + assert lines.contains("B2_SAMPLE3\t2\t0\t1\t1\t") // check IRIDA Next JSON file lines = path("$launchDir/results/iridanext.output.json.gz").linesGzip.join("\n") diff --git a/workflows/snvphylnfc.nf b/workflows/snvphylnfc.nf index b0b18cb..e704b64 100644 --- a/workflows/snvphylnfc.nf +++ b/workflows/snvphylnfc.nf @@ -84,21 +84,39 @@ workflow SNVPHYL { ch_versions = Channel.empty() + // Track processed IDs + def processedIDs = [] as Set + // Create a new channel of metadata from a sample sheet // NB: `input` corresponds to `params.input` and associated sample sheet schema input = Channel.fromSamplesheet("input") // Map the inputs so that they conform to the nf-core-expected "reads" format. // Either [meta, [fastq_1], reference_assembly] // or [meta, [fastq_1, fastq_2], reference_assembly] if fastq_2 exists + // and remove non-alphanumeric characters in sample_names (meta.id), whilst also correcting for duplicate sample_names (meta.id) .map { meta, fastq_1, fastq_2, reference_assembly -> + if (!meta.id) { + meta.id = meta.irida_id + } else { + // Non-alphanumeric characters (excluding _,-,.) will be replaced with "_" + meta.id = meta.id.replaceAll(/[^A-Za-z0-9_.\-]/, '_') + } + // Ensure ID is unique by appending meta.irida_id if needed + while (processedIDs.contains(meta.id)) { + meta.id = "${meta.id}_${meta.irida_id}" + } + // Add the ID to the set of processed IDs + processedIDs << meta.id + fastq_2 ? tuple(meta, [ file(fastq_1), file(fastq_2) ], reference_assembly) : tuple(meta, [ file(fastq_1) ], file(reference_assembly))} + // Channel of read tuples (meta, [fastq_1, fastq_2*]): reads = input.map { meta, reads, reference_assembly -> tuple(meta, reads) } - // Channel of sample tuples (sample ID, assembly): - sample_assemblies = input.map { meta, reads, reference_assembly -> tuple(meta.id, reference_assembly ? reference_assembly : null) } + // Channel of sample tuples (meta, assembly): + sample_assemblies = input.map { meta, reads, reference_assembly -> tuple(meta, reference_assembly ? reference_assembly : null) } reference_genome = select_reference(params.refgenome, params.reference_sample_id, sample_assemblies) @@ -243,7 +261,8 @@ def select_reference(refgenome, reference_sample_id, sample_assemblies) { log.debug "Selecting reference genome ${reference_genome} from '--refgenome'." } else if (reference_sample_id) { - reference_genome = sample_assemblies.filter { it[0] == reference_sample_id && it[1] != null} + // Check each meta category (meta.id, meta.id_alt, meta.irida_id) for a match to params.reference_sample_id + reference_genome = sample_assemblies.filter { (it[0].id == reference_sample_id || it[0].irida_id == reference_sample_id || it[0].id_alt == reference_sample_id) && it[1] != null} .ifEmpty { error("The provided reference sample ID (${reference_sample_id}) is either missing or has no associated reference assembly.") } .map { it[1] } .first()