-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastq > cram using nf-core #17
base: develop
Are you sure you want to change the base?
Conversation
…ure/add_trimgalore
… RNASeq names in run sh
…n_nextflow_RNAseq.sh
…alore_branch Feature/add multiqc to trimgalore branch
…mail function (use analysis_id in fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current pipeline (main.nf in repo main dir) is divided into 3 steps:
- subworkflow for pipeline initiation
- DXNEXTFLOWRNA, a umcugenetics-specific workflow
- subworkflow for pipeline completion
There are three folders that contain pieces of the workflow:
- modules
- subworkflows
- workflows
The flow can be followed from the main.nf scripts in eacht (sub)folder.
main.nf: include { DXNEXTFLOWRNA } from './workflows/dxnextflowrna'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_umcugenetics_dxnextflowrna_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_umcugenetics_dxnextflowrna_pipeline'
The DXNEXTFLOWRNA workflow consists of modules, subworkflows and functions:
// MODULES
include { MULTIQC } from '../modules/nf-core/multiqc/main'
// SUBWORKFLOWS
include { FASTQ_BAM_QC } from '../subworkflows/local/fastq_bam_qc'
include { FASTQ_TRIM_FILTER_ALIGN_DEDUP } from '../subworkflows/local/fastq_trim_filter_align_dedup'
workflows/dxnextflowrna.nf: definition of reference channels, input channels and workflows
SUBWORKFLOW: Run fastq_trim_filter_align_dedup contains TrimGalore, SortMeRNA, STAR and SAMTOOLS and UMITOOLS (ends with CRAM files)
SUBWORKFLOW: Run fastq_bam_qc contains FastQC, PICARD and PRESEQ voor QC
MODULE: MultiQC
Modules are unmodified. Whenever modifications have been made, files have been moved from a nf-core folder to a local folder. Do the files in nf-core folder still need to be maintained? This seems to superfluous.
For example, there is no functional difference between subworkflows/nf-core/utils_nfcore_pipeline and subworkflows/local/utils_nfcore_pipeline; can one of the folders be removed?
Names of processes and parameters look informative and are consistent with the conventions in https://nf-co.re/docs/guidelines/components/overview
Currently, the workflow cannot be tested by anyone other than the author, because of the temporary fix for SortMeRNA, which requires access to the file
references/sortmerna/sortmerna_v4.3.4_db/smr_v4.3_sensitive_db.fasta
It is unclear to me at this moment whether follow-up analysis tools can work with CRAM files or will in the future; featureCounts for exammple (see https://www.biostars.org/p/9577607/). Is it possible for the time being to also export the bam files?
Alignment of blocks of similar entries is sometimes done as described in https://nf-co.re/docs/contributing/code_editors_and_styling/harshil_alignment and sometimes not. For example, the include blocks are nicely aligned on { and }, the workflow take channels are aligned on // but the emit channels are only aligned on = but not on //. Be consistent. Personally, I like this kind of aligment; it makes it easier to read, but some in our team do not agree. The nf-core team uses this alignment (see for example subworkflows/nf-core/bam_dedup_stats_samtools_umitools/main.nf), so I guess we should too.
Nice to see the versions of each module in the MultiQC html file. Just one looks strange: FastQC (, '0.12.1') Perhaps this can be fixed?
[![GitHub Actions Linting Status](https://github.com/UMCUGenetics/dxnextflowrna/workflows/nf-core%20linting/badge.svg)](https://github.com/UMCUGenetics/dxnextflowrna/actions?query=workflow%3A%22nf-core+linting%22)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) | ||
<h1> | ||
<picture> | ||
<source media="(prefers-color-scheme: dark)" srcset="docs/images/umcugenetics-dxnextflowrna_logo_dark.png"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files umcugenetics-dxnextflowrna_logo_light/dark.png are in folder assets, not in docs/images
genome_size: hg38_genome | ||
notrim: true | ||
read_length: 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't read_length be 150?
pattern: "[_.-](A|B)[A-Za-z0-9]{9}[_.-]S[0-9]+[_.-]L007[_.-](R2[_.-]001|2)$" | ||
"(L008 R1)": | ||
- "L007_R1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in line 108: should be L008_R1
@@ -0,0 +1,27 @@ | |||
name: DxNextflowRNA | |||
description: UMCU Genetics RNA Workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be UMCU Genetics RNAseq Workflow, comparable to WES or WGS
@@ -0,0 +1,90 @@ | |||
# umcugenetics/dxnextflowrna pipeline parameters | |||
UMCU Genetics RNA Workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be UMCU Genetics RNAseq Workflow, comparable to WES or WGS
| `multiqc_logo` | Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file | `string` | | | True | | ||
| `multiqc_methods_description` | Custom MultiQC yaml file containing HTML including a methods description. | `string` | | | | | ||
| `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True | | True | | ||
| `pipelines_testdata_base_path` | Base URL or local path to location of pipeline test dataset files | `string` | https://raw.githubusercontent.com/nf-core/test-datasets/ | | True | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: | | ||
List of input FastQ files of size 1 and 2 for single-end and paired-end data, | ||
respectively. | ||
- - meta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this line have two "-"? Is that correct?
'https://depot.galaxyproject.org/singularity/multiqc:1.17--pyhdfd78af_0' : | ||
'biocontainers/multiqc:1.17--pyhdfd78af_0' }" | ||
'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : | ||
'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" | ||
|
||
input: | ||
path multiqc_files, stageAs: "?/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No parentheses for path statement. Guidelines specifies that parentheses are required within tuples, but not used for single statements. Not sure how that works with stageAs.
'https://depot.galaxyproject.org/singularity/multiqc:1.17--pyhdfd78af_0' : | ||
'biocontainers/multiqc:1.17--pyhdfd78af_0' }" | ||
'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : | ||
'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" | ||
|
||
input: | ||
path multiqc_files, stageAs: "?/*" | ||
path(multiqc_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nf-core native code generally does not contain parentheses for single path statements. I would remove them here.
include { SAMTOOLS_CONVERT } from '../../modules/nf-core/samtools/convert/main' | ||
include { SAMTOOLS_INDEX } from '../../modules/nf-core/samtools/index/main' | ||
include { SAMTOOLS_MERGE } from '../../modules/nf-core/samtools/merge/main' | ||
include { SORTMERNA as SORTMERNA_READS } from '../../modules/nf-core/sortmerna/main' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you import SORTMERNA as SORTMERNA_READS? I don't see any conflicts if you would leave this as SORTMERNA.
What has changed?
DxNextflowRNA workflow to create a cram file (output) from fastq files (input).
Since this is a major refactor with current workflow in dev/main, the code review should focus on the new files instead of comparing the differences.
The branch name does not reflect the feature and is not conform our standards, I apologize. :)
Noteworthy in my opinion.
1. Main workflow
I defined two subworkflows;
workflow.OnComplete is responsible to send an email when workflow completed. I used nf-core functionality, which means emails differ from format compared to DxNextflowWES for example. This can be configured using templates, but is considered out-of-scope for v1.0.0.
CRAM output
If cram files will be used as input in follow-up analyses, the same reference files are required to be able to process cram files. Therefore, the reference files used to create the cram files are added to the multiqc report.
2. Naming conventions
I tried to use the naming conventions and guidelines of the nf-core community. As well as the code styling guidelines.
I decided that linting using
ruff
and pre-commit configuration will be part of future releases.3. MultiQC
MultiQC is added to the main workflow, to enable a single report per analysis.
Sample grouping is used (available since MultiQC v1.25), although it is not supported for all tools used in this pipeline.
4. Reference files
Instructions to generate/download the required reference files are added to the
README.md
.5. Dynamic resources
I used previous versions of the pipeline to calculate dynamic resources, since most tools are used in both pipelines. The current settings might need some tweaking over time.
6. SortMeRNA
SortMeRna is available as module in nf-core. However, I encountered a bug when using this tool. It appears I needed to update it to version >= 4.3.7. Solution: override container via modules.config. Once the update is available in nf-core, I should switch and remove the override again.
Index SortMeRNA.
Runtimes of SortMeRNA are optimized by creating a sortmerna index first. This is done by executing the tool with settings as configured SORTMERNA_INDEX. Convenietly, I created a workflow to do just that. I am not certain if I should add the workflow to this repo, or whether instructions in
README.md
would be sufficient. Please let me know your opinion :)7. Completion email
Waiting for fix in nf-core/tools#3304.
Implemented the fixes locally (moved nf-core/utils_nfcore_pipeline to local).
Until pull-request with fix is merged, local implementation is required.
8. Considered out-of-scope for release v1.0.0