fastq > cram using nf-core #17

ellendejong · 2024-11-19T10:38:05Z

What has changed?

DxNextflowRNA workflow to create a cram file (output) from fastq files (input).
Since this is a major refactor with current workflow in dev/main, the code review should focus on the new files instead of comparing the differences.

The branch name does not reflect the feature and is not conform our standards, I apologize. :)

Noteworthy in my opinion.

1. Main workflow

I defined two subworkflows;

fastq > cram (trimming, filter or rRNA, alignment, removal of duplicates, umi deduplication)
QC tools

workflow.OnComplete is responsible to send an email when workflow completed. I used nf-core functionality, which means emails differ from format compared to DxNextflowWES for example. This can be configured using templates, but is considered out-of-scope for v1.0.0.

CRAM output
If cram files will be used as input in follow-up analyses, the same reference files are required to be able to process cram files. Therefore, the reference files used to create the cram files are added to the multiqc report.

2. Naming conventions

I tried to use the naming conventions and guidelines of the nf-core community. As well as the code styling guidelines.
I decided that linting usingruff and pre-commit configuration will be part of future releases.

3. MultiQC

MultiQC is added to the main workflow, to enable a single report per analysis.
Sample grouping is used (available since MultiQC v1.25), although it is not supported for all tools used in this pipeline.

4. Reference files

Instructions to generate/download the required reference files are added to the README.md.

5. Dynamic resources

I used previous versions of the pipeline to calculate dynamic resources, since most tools are used in both pipelines. The current settings might need some tweaking over time.

6. SortMeRNA

SortMeRna is available as module in nf-core. However, I encountered a bug when using this tool. It appears I needed to update it to version >= 4.3.7. Solution: override container via modules.config. Once the update is available in nf-core, I should switch and remove the override again.

Index SortMeRNA.
Runtimes of SortMeRNA are optimized by creating a sortmerna index first. This is done by executing the tool with settings as configured SORTMERNA_INDEX. Convenietly, I created a workflow to do just that. I am not certain if I should add the workflow to this repo, or whether instructions in README.md would be sufficient. Please let me know your opinion :)

7. Completion email

Waiting for fix in nf-core/tools#3304.
Implemented the fixes locally (moved nf-core/utils_nfcore_pipeline to local).
Until pull-request with fix is merged, local implementation is required.

8. Considered out-of-scope for release v1.0.0

Email layout adjustments will be something for the future :)
docs (usage.md, output.md, README.md). Currently limited to either the template version created by nf-core tools or edited in case of necessary information only.

…ure/add_trimgalore

… RNASeq names in run sh

…n_nextflow_RNAseq.sh

…ema.

…alore_branch Feature/add multiqc to trimgalore branch

…eline create

…mail function (use analysis_id in fields).

…tainer to v4.3.7

mraves2

The current pipeline (main.nf in repo main dir) is divided into 3 steps:

subworkflow for pipeline initiation
DXNEXTFLOWRNA, a umcugenetics-specific workflow
subworkflow for pipeline completion

There are three folders that contain pieces of the workflow:

modules
subworkflows
workflows

The flow can be followed from the main.nf scripts in eacht (sub)folder.
main.nf: include { DXNEXTFLOWRNA } from './workflows/dxnextflowrna'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_umcugenetics_dxnextflowrna_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_umcugenetics_dxnextflowrna_pipeline'

The DXNEXTFLOWRNA workflow consists of modules, subworkflows and functions:
// MODULES
include { MULTIQC } from '../modules/nf-core/multiqc/main'
// SUBWORKFLOWS
include { FASTQ_BAM_QC } from '../subworkflows/local/fastq_bam_qc'
include { FASTQ_TRIM_FILTER_ALIGN_DEDUP } from '../subworkflows/local/fastq_trim_filter_align_dedup'

workflows/dxnextflowrna.nf: definition of reference channels, input channels and workflows
SUBWORKFLOW: Run fastq_trim_filter_align_dedup contains TrimGalore, SortMeRNA, STAR and SAMTOOLS and UMITOOLS (ends with CRAM files)
SUBWORKFLOW: Run fastq_bam_qc contains FastQC, PICARD and PRESEQ voor QC
MODULE: MultiQC

Modules are unmodified. Whenever modifications have been made, files have been moved from a nf-core folder to a local folder. Do the files in nf-core folder still need to be maintained? This seems to superfluous.
For example, there is no functional difference between subworkflows/nf-core/utils_nfcore_pipeline and subworkflows/local/utils_nfcore_pipeline; can one of the folders be removed?

Names of processes and parameters look informative and are consistent with the conventions in https://nf-co.re/docs/guidelines/components/overview

Currently, the workflow cannot be tested by anyone other than the author, because of the temporary fix for SortMeRNA, which requires access to the file
references/sortmerna/sortmerna_v4.3.4_db/smr_v4.3_sensitive_db.fasta

It is unclear to me at this moment whether follow-up analysis tools can work with CRAM files or will in the future; featureCounts for exammple (see https://www.biostars.org/p/9577607/). Is it possible for the time being to also export the bam files?

Alignment of blocks of similar entries is sometimes done as described in https://nf-co.re/docs/contributing/code_editors_and_styling/harshil_alignment and sometimes not. For example, the include blocks are nicely aligned on { and }, the workflow take channels are aligned on // but the emit channels are only aligned on = but not on //. Be consistent. Personally, I like this kind of aligment; it makes it easier to read, but some in our team do not agree. The nf-core team uses this alignment (see for example subworkflows/nf-core/bam_dedup_stats_samtools_umitools/main.nf), so I guess we should too.

Nice to see the versions of each module in the MultiQC html file. Just one looks strange: FastQC (, '0.12.1') Perhaps this can be fixed?

mraves2 · 2025-01-10T16:13:13Z

README.md

-[![GitHub Actions Linting Status](https://github.com/UMCUGenetics/dxnextflowrna/workflows/nf-core%20linting/badge.svg)](https://github.com/UMCUGenetics/dxnextflowrna/actions?query=workflow%3A%22nf-core+linting%22)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
+<h1>
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="docs/images/umcugenetics-dxnextflowrna_logo_dark.png">


files umcugenetics-dxnextflowrna_logo_light/dark.png are in folder assets, not in docs/images

mraves2 · 2025-01-10T16:14:22Z

assets/multiqc_config.yml

+  genome_size: hg38_genome
+  notrim: true
+  read_length: 300


Shouldn't read_length be 150?

mraves2 · 2025-01-10T16:15:18Z

assets/multiqc_config.yml

+      pattern: "[_.-](A|B)[A-Za-z0-9]{9}[_.-]S[0-9]+[_.-]L007[_.-](R2[_.-]001|2)$"
+  "(L008 R1)":
+    - "L007_R1"


Typo in line 108: should be L008_R1

mraves2 · 2025-01-10T16:17:20Z

assets/template.yml

@@ -0,0 +1,27 @@
+name: DxNextflowRNA
+description: UMCU Genetics RNA Workflow


Should be UMCU Genetics RNAseq Workflow, comparable to WES or WGS

mraves2 · 2025-01-10T16:17:38Z

docs/parameters.md

@@ -0,0 +1,90 @@
+# umcugenetics/dxnextflowrna pipeline parameters
+UMCU Genetics RNA Workflow


Should be UMCU Genetics RNAseq Workflow, comparable to WES or WGS

mraves2 · 2025-01-10T16:24:10Z

docs/parameters.md

+| `multiqc_logo` | Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file | `string` |  |  | True |
+| `multiqc_methods_description` | Custom MultiQC yaml file containing HTML including a methods description. | `string` |  |  |  |
+| `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True |  | True |
+| `pipelines_testdata_base_path` | Base URL or local path to location of pipeline test dataset files | `string` | https://raw.githubusercontent.com/nf-core/test-datasets/ |  | True |


URL https://raw.githubusercontent.com/nf-core/test-datasets/ does not exist, https://github.com/nf-core/test-datasets does.

mraves2 · 2025-01-10T16:26:17Z

modules/nf-core/fastqc/meta.yml

-      description: |
-        List of input FastQ files of size 1 and 2 for single-end and paired-end data,
-        respectively.
+  - - meta:


Why does this line have two "-"? Is that correct?

mraves2 · 2025-01-10T16:28:43Z

modules/nf-core/multiqc/main.nf

-        'https://depot.galaxyproject.org/singularity/multiqc:1.17--pyhdfd78af_0' :
-        'biocontainers/multiqc:1.17--pyhdfd78af_0' }"
+        'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' :
+        'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }"

    input:
    path  multiqc_files, stageAs: "?/*"


No parentheses for path statement. Guidelines specifies that parentheses are required within tuples, but not used for single statements. Not sure how that works with stageAs.

mraves2 · 2025-01-10T16:29:56Z

modules/nf-core/multiqc/main.nf

-        'https://depot.galaxyproject.org/singularity/multiqc:1.17--pyhdfd78af_0' :
-        'biocontainers/multiqc:1.17--pyhdfd78af_0' }"
+        'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' :
+        'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }"

    input:
    path  multiqc_files, stageAs: "?/*"
    path(multiqc_config)


nf-core native code generally does not contain parentheses for single path statements. I would remove them here.

mraves2 · 2025-01-10T16:33:05Z

subworkflows/local/fastq_trim_filter_align_dedup.nf

+include { SAMTOOLS_CONVERT                  } from '../../modules/nf-core/samtools/convert/main'
+include { SAMTOOLS_INDEX                    } from '../../modules/nf-core/samtools/index/main'
+include { SAMTOOLS_MERGE                    } from '../../modules/nf-core/samtools/merge/main'
+include { SORTMERNA as SORTMERNA_READS      } from '../../modules/nf-core/sortmerna/main'


Why do you import SORTMERNA as SORTMERNA_READS? I don't see any conflicts if you would leave this as SORTMERNA.

ellendejong and others added 30 commits December 6, 2023 13:52

Added hello to readme.

a11396c

Merge remote-tracking branch 'origin/feature/nf-core_clean' into feat…

98010a1

…ure/add_trimgalore

Added CustomModules and NexflowModules

6d9c0a9

add folder trimgalore to modules

2c56d94

Alternative workflow with trimgalore test

0853345

Add VSC and MacOS to gitignore.

fea920d

add whitespace to gitignore headers

7584f5a

Add run_nextflow_RNAseq.sh

b030c16

Line comments in run_nextflow_RNAseq.sh

03eba3d

change time run_nextflow_RNA_seq.sh

0ca8bcd

add run_nextflow_RNAseq_srun.sh

a093b5d

changed --fastq_path to --input run_nextflow_RNAseq.sh

3dedc53

change input variable

ed9ead0

add export java version statements & comments hello.nf

efa2926

replace java version exports

1c1208f

changed nextflow location

271a822

run_nextflow_RNAseq.sh to submit RNAseq workflow via sbatch

a449047

Add test_main.nf for testing trimgalore and star/align workflow

d6d2b74

Delete hello.nf and run_nextflow_RNAseq_srun.sh

f783c93

added test_main.nf to run in run_nextflow_RNAseq.sh

048be8c

changed nextflow run command & order

f8fb218

added profiles in nextflow.config

7a192f8

DEDUP and nf-schema.json fix for index and gtf warning

55f9db5

added EOT and SBATCH commands

bd8963c

Added samtools&featurecounts to run whole nf-pipeline with trimgalore

c192a1d

delete test_main.nf & update main.nf with trimgalore & changed WES to…

462d3a5

… RNASeq names in run sh

changed nextflow run file test_main.nf to main.nf

bc6d8dc

changed sbatch time to 4h

969793f

changed output text, use of woorkflow_path and input parameters in ru…

0f486c2

…n_nextflow_RNAseq.sh

changed output text in run.sh

9cc46fb

ellendejong and others added 12 commits November 15, 2024 11:35

Moved reference files

a8c4c2f

revert 8bbaaa3; add samtools_index again.

17e51e2

Add gencode_version_name and sortmerna_index_versions to nextflow_sch…

13ed8d1

…ema.

Use samtools index .bai

626c594

add sortmerna index publishdir

ebddb00

Increase mem rseqc

922adcd

umitools dedup add ext.arg TMPDIR

ef2fa3c

change resources COLLECTRNASEQMETRICS

e71dff5

Remove tmpspace MultiQC

a820105

remove publishdir samtools index

515daa7

Move params into defined categories.

57596b0

Merge pull request #16 from UMCUGenetics/feature/add_multiqc_to_trimg…

a9f0ec3

…alore_branch Feature/add multiqc to trimgalore branch

ellendejong changed the title ~~DxNextflowRNA fastq > cram using nf-core modules and (sub)workflows~~ fastq > cram using nf-core Nov 19, 2024

ellendejong marked this pull request as draft November 27, 2024 11:04

ellendejong added 14 commits December 10, 2024 13:57

Update all modules and subworkflows by using templates of nf-core pip…

1490bab

…eline create

Move subworkflow utils_nfcore_pipeline to local to change completionE…

4773e00

…mail function (use analysis_id in fields).

Update README.

e130327

Add report_comment and change order in multiqc_config.

0f05879

update .nf-core.yml

641d6f0

Add analysis_id outside workflow (in run.sh).

2c681a3

Add CHANGELOG template.

307eeab

Add code of conduct.

1245078

Remove .docs from gitignore

c0e45ed

Remove help from fasta

7f40fd4

Update docs

854b7d7

remove section nextflow_config.

fcbb8d4

Auto-generate modules.json

13cd8ad

Delete local copy of sortmerna and use modules.config to override con…

e053145

…tainer to v4.3.7

ellendejong marked this pull request as ready for review December 17, 2024 08:13

mraves2 approved these changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastq > cram using nf-core #17

fastq > cram using nf-core #17

ellendejong commented Nov 19, 2024 •

edited

Loading

mraves2 left a comment

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

mraves2 Jan 10, 2025

		@@ -0,0 +1,27 @@
		name: DxNextflowRNA
		description: UMCU Genetics RNA Workflow

		@@ -0,0 +1,90 @@
		# umcugenetics/dxnextflowrna pipeline parameters
		UMCU Genetics RNA Workflow

fastq > cram using nf-core #17

Are you sure you want to change the base?

fastq > cram using nf-core #17

Conversation

ellendejong commented Nov 19, 2024 • edited Loading

What has changed?

Noteworthy in my opinion.

1. Main workflow

2. Naming conventions

3. MultiQC

4. Reference files

5. Dynamic resources

6. SortMeRNA

7. Completion email

8. Considered out-of-scope for release v1.0.0

mraves2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ellendejong commented Nov 19, 2024 •

edited

Loading