Skip to content

Commit

Permalink
Adds fq/lint for early validation of FASTQs
Browse files Browse the repository at this point in the history
Validation of FASTQS early prevents running the pipeline on invalid FASTQ files which will make the pipeline more efficient at achieving it's ultimate objective of checking FASTQ validity.

It adds 3 more parameters:
 - `--skip_linting` which enables the linting of FASTQs
 - `--fq_lint_args` which is a string of arguments to pass to the linting tool
 - `--continue_with_lint_fail` which is a boolean to determine whether to continue if the linting fails

Between these three options the user has a high degree of control over how the pipeline lints which should handle most use cases.

Closes nf-core#31
  • Loading branch information
adamrtalbot committed Dec 19, 2024
1 parent 1f7dc68 commit 7072d59
Show file tree
Hide file tree
Showing 16 changed files with 372 additions and 14 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
- [#53](https://github.com/nf-core/seqinspector/pull/53) Add FastQ-Screen database multiplexing and limit scope of nf-test in CI.
- [#67](https://github.com/nf-core/seqinspector/pull/67) Add FASTQ linting for early validation

### `Fixed`

Expand Down
2 changes: 2 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
## Pipeline tools

- [FQ](https://github.com/stjude-rust-labs/fq)

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. Lint FASTQs with ([`fq`](https://github.com/stjude-rust-labs/fq))
1. Subsample reads ([`Seqtk`](https://github.com/lh3/seqtk))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
1. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

## Usage

Expand Down
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: 'FQ_LINT' {
ext.args = { params.fq_lint_args }
errorStrategy = {
task.exitStatus in ((130..145) + 104) ? 'retry' :
params.continue_with_lint_fail ? 'ignore' :
'finish'
}
}

withName: SEQTK_SAMPLE {
ext.args = '-s100'
}
Expand Down
49 changes: 38 additions & 11 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,59 @@
"bowtie2/build": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
},
"fastqc": {
"branch": "master",
"git_sha": "08108058ea36a63f141c25c4e75f9f872a5b2296",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
},
"fastqscreen/buildfromindex": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
},
"fastqscreen/fastqscreen": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"],
"installed_by": [
"modules"
],
"patch": "modules/nf-core/fastqscreen/fastqscreen/fastqscreen-fastqscreen.diff"
},
"fq/lint": {
"branch": "master",
"git_sha": "a1abf90966a2a4016d3c3e41e228bfcbd4811ccc",
"installed_by": [
"modules"
]
},
"multiqc": {
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
},
"seqfu/stats": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
},
"seqtk/sample": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
"installed_by": [
"modules"
]
}
}
},
Expand All @@ -48,20 +69,26 @@
"utils_nextflow_pipeline": {
"branch": "master",
"git_sha": "c2b22d85f30a706a3073387f30380704fcae013b",
"installed_by": ["subworkflows"]
"installed_by": [
"subworkflows"
]
},
"utils_nfcore_pipeline": {
"branch": "master",
"git_sha": "51ae5406a030d4da1e49e4dab49756844fdd6c7a",
"installed_by": ["subworkflows"]
"installed_by": [
"subworkflows"
]
},
"utils_nfschema_plugin": {
"branch": "master",
"git_sha": "2fd2cd6d0e7b273747f32e465fdc6bcc3ae0814e",
"installed_by": ["subworkflows"]
"installed_by": [
"subworkflows"
]
}
}
}
}
}
}
}
5 changes: 5 additions & 0 deletions modules/nf-core/fq/lint/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 33 additions & 0 deletions modules/nf-core/fq/lint/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions modules/nf-core/fq/lint/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

63 changes: 63 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions modules/nf-core/fq/lint/tests/tags.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@ params {
// Input options
input = null
sample_size = 0

// Options
skip_linting = false
fq_lint_args = ""
continue_with_lint_fail = false


// References
genome = null
fasta = null
Expand Down
28 changes: 27 additions & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
},
"outdir": {
"type": "string",
"default": null,
"format": "directory-path",
"description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
"fa_icon": "fas fa-folder-open"
Expand All @@ -50,6 +49,30 @@
}
}
},
"validation_options": {
"title": "Validation options",
"type": "object",
"description": "Options for validating and screening FASTQ files.",
"default": "",
"properties": {
"skip_linting": {
"type": "boolean",
"default": false,
"description": "Whether to lint the FASTQs before performing QC on the sequences",
"help_text": "FASTQ files will be linted with FQ early in the pipeline. If they fail validation, the pipeline will terminate preventing expensive quality control steps being performed on the other samples. If ignoring FQ is enabled, quality control will be performed on the remaining samples."
},
"fq_lint_args": {
"type": "string",
"description": "Arguments to pass to FQ lint",
"help_text": "Arguments to pass to FQ lint. This can be used to disable overly strict linting. See https://github.com/stjude-rust-labs/fq?tab=readme-ov-file#lint for more information."
},
"continue_with_lint_fail": {
"type": "boolean",
"description": "Whether to continue with the pipeline if linting fails for a single sample.",
"help_text": "If set to true, the pipeline will continue with the remaining samples if linting fails for a single sample. If set to false, the pipeline will terminate if linting fails for a single sample."
}
}
},
"reference_genome_options": {
"title": "Reference genome options",
"type": "object",
Expand Down Expand Up @@ -245,6 +268,9 @@
{
"$ref": "#/$defs/input_output_options"
},
{
"$ref": "#/$defs/validation_options"
},
{
"$ref": "#/$defs/reference_genome_options"
},
Expand Down
Loading

0 comments on commit 7072d59

Please sign in to comment.