Skip to content

Commit

Permalink
Subworkflow Infrastructure (#662)
Browse files Browse the repository at this point in the history
* feat(subworkflows): Add align_bowtie2 subworkflow

For testing CI setup

* test(align_bowtie2): Add initial list of changes to test

* test(align_bowtie2): Add initial test

* refactor: Use tags to run subworkflows ci

For every underlying module used by workflow and allow the modules
pytest-modules definition be the source of truth.

* refactor: Use individual directories for subworkflows

* docs(align_bowtie2): Add initial meta.yml

Copied most of it from the bowtie2/align module.

* fix(align_bowtie2): Fix module include paths

* test(bam_sort_samtools): Add initial test

* ci(bam_sort_samtools): Add modules that trigger the tag

* test(bam_stats_samtools): Add initial test

* ci(bam_stats_samtools): Add keys to pick up changes

* docs(bam_samtools): Add initial meta.yml

* test(align_bowtie2): Fix path to subworkflow

* test(align_bowtie2): Update entry point

* fix(bam_sort_samtools): Update include paths

* test(bam_sort_samtools): Fix path

* style: Clean up addParams

* test(samtools_sort): Add suffix for test

* test(align_bowtie2): Add samtools_options for suffix

* test(bam_stats_samtools): Update path

* test(bam_stats_samtools): Use stats input

Otherwise it's just an example of how it's used in the bam_sort_samtools subworkflow

* ci(linting): Skip module linting of subworkflows

* ci(linting): Clean up startsWith statement

* test(bam_stats_samtools): Use single end test data for single end test

* test(bam_stats_samtools): Add expected files

* test(align_bowtie2): Add paired-end test

* test(align_bowtie2): Sort order of output

* test(align_bowtie2): Update hashes

* docs(align_bowtie2): Fix typo

* test(align_bowtie2): Update samtools output names

* test(align_bowtie2): Remove md5sums for bam/bai

* feat(subworkflows): Add nextflow.configs

These can be used for default settings in the future. They can then be
included in the conf/modules.config so that the params don't have to be
duplicated in the root nextflow.config.

* docs(subworkflows): Include modules instead of tools

* fix: Update to versions

* chore(align_bowtie2): Remove duplicate tag

* style: Format yamls

* test(subworkflows): Only check versions for modules

* chore: Update subworkflows to match rnaseq dev

* fix(subworkflows): Update paths

* fix(bam_sort_samtools): Fix sort parameters for testing

* Apply suggestions from code review

Co-authored-by: Harshil Patel <[email protected]>

* docs: Update TODOs with a message

* ci: Try using a matrix for strategy

* ci: Try passing an array

* Revert "ci: Try passing an array"

This reverts commit d3611fc.

Co-authored-by: Harshil Patel <[email protected]>
  • Loading branch information
edmundmiller and drpatelh authored Oct 8, 2021
1 parent f479d4f commit c19671d
Show file tree
Hide file tree
Showing 21 changed files with 524 additions and 5 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/nf-core-linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ jobs:
- name: Lint ${{ matrix.tags }}
run: nf-core modules lint ${{ matrix.tags }}
# HACK
if: startsWith( matrix.tags, 'subworkflow' ) != true

- uses: actions/cache@v2
with:
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/pytest-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ jobs:
changes:
name: Check for changes
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
filter:
["tests/config/pytest_modules.yml", "tests/config/pytest_subworkflows.yml"]
outputs:
# Expose matched filters as job 'modules' output variable
modules: ${{ steps.filter.outputs.changes }}
Expand All @@ -18,7 +23,7 @@ jobs:
- uses: dorny/paths-filter@v2
id: filter
with:
filters: "tests/config/pytest_modules.yml"
filters: ${{ matrix.filter }}

test:
runs-on: ubuntu-20.04
Expand Down
47 changes: 47 additions & 0 deletions subworkflows/nf-core/align_bowtie2/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
//
// Alignment with Bowtie2
//

params.align_options = [:]
params.samtools_sort_options = [:]
params.samtools_index_options = [:]
params.samtools_stats_options = [:]

include { BOWTIE2_ALIGN } from '../../../modules/bowtie2/align/main' addParams( options: params.align_options )
include { BAM_SORT_SAMTOOLS } from '../bam_sort_samtools/main' addParams( sort_options: params.samtools_sort_options, index_options: params.samtools_index_options, stats_options: params.samtools_stats_options )

workflow ALIGN_BOWTIE2 {
take:
reads // channel: [ val(meta), [ reads ] ]
index // channel: /path/to/bowtie2/index/

main:

ch_versions = Channel.empty()

//
// Map reads with Bowtie2
//
BOWTIE2_ALIGN ( reads, index )
ch_versions = ch_versions.mix(BOWTIE2_ALIGN.out.versions.first())

//
// Sort, index BAM file and run samtools stats, flagstat and idxstats
//
BAM_SORT_SAMTOOLS ( BOWTIE2_ALIGN.out.bam )
ch_versions = ch_versions.mix(BAM_SORT_SAMTOOLS.out.versions)

emit:
bam_orig = BOWTIE2_ALIGN.out.bam // channel: [ val(meta), bam ]
log_out = BOWTIE2_ALIGN.out.log // channel: [ val(meta), log ]
fastq = BOWTIE2_ALIGN.out.fastq // channel: [ val(meta), fastq ]

bam = BAM_SORT_SAMTOOLS.out.bam // channel: [ val(meta), [ bam ] ]
bai = BAM_SORT_SAMTOOLS.out.bai // channel: [ val(meta), [ bai ] ]
csi = BAM_SORT_SAMTOOLS.out.csi // channel: [ val(meta), [ csi ] ]
stats = BAM_SORT_SAMTOOLS.out.stats // channel: [ val(meta), [ stats ] ]
flagstat = BAM_SORT_SAMTOOLS.out.flagstat // channel: [ val(meta), [ flagstat ] ]
idxstats = BAM_SORT_SAMTOOLS.out.idxstats // channel: [ val(meta), [ idxstats ] ]

versions = ch_versions // channel: [ versions.yml ]
}
50 changes: 50 additions & 0 deletions subworkflows/nf-core/align_bowtie2/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: align_bowtie2
description: Align reads to a reference genome using bowtie2 then sort with samtools
keywords:
- align
- fasta
- genome
- reference
modules:
- bowtie2/align
- samtools/sort
- samtools/index
- samtools/stats
- samtools/idxstats
- samtools/flagstat
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- reads:
type: file
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
- index:
type: file
description: Bowtie2 genome index files
pattern: '*.ebwt'
# TODO Update when we decide on a standard for subworkflow docs
output:
- bam:
type: file
description: Output BAM file containing read alignments
pattern: '*.{bam}'
- versions:
type: file
description: File containing software versions
pattern: 'versions.yml'
- fastq:
type: file
description: Unaligned FastQ files
pattern: '*.fastq.gz'
- log:
type: file
description: Alignment log
pattern: '*.log'
# TODO Add samtools outputs
authors:
- '@drpatelh'
2 changes: 2 additions & 0 deletions subworkflows/nf-core/align_bowtie2/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
params.align_options = [:]
params.samtools_options = [:]
53 changes: 53 additions & 0 deletions subworkflows/nf-core/bam_sort_samtools/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
//
// Sort, index BAM file and run samtools stats, flagstat and idxstats
//

params.sort_options = [:]
params.index_options = [:]
params.stats_options = [:]

include { SAMTOOLS_SORT } from '../../../modules/samtools/sort/main' addParams( options: params.sort_options )
include { SAMTOOLS_INDEX } from '../../../modules/samtools/index/main' addParams( options: params.index_options )
include { BAM_STATS_SAMTOOLS } from '../bam_stats_samtools/main' addParams( options: params.stats_options )

workflow BAM_SORT_SAMTOOLS {
take:
ch_bam // channel: [ val(meta), [ bam ] ]

main:

ch_versions = Channel.empty()

SAMTOOLS_SORT ( ch_bam )
ch_versions = ch_versions.mix(SAMTOOLS_SORT.out.versions.first())

SAMTOOLS_INDEX ( SAMTOOLS_SORT.out.bam )
ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions.first())

SAMTOOLS_SORT.out.bam
.join(SAMTOOLS_INDEX.out.bai, by: [0], remainder: true)
.join(SAMTOOLS_INDEX.out.csi, by: [0], remainder: true)
.map {
meta, bam, bai, csi ->
if (bai) {
[ meta, bam, bai ]
} else {
[ meta, bam, csi ]
}
}
.set { ch_bam_bai }

BAM_STATS_SAMTOOLS ( ch_bam_bai )
ch_versions = ch_versions.mix(BAM_STATS_SAMTOOLS.out.versions)

emit:
bam = SAMTOOLS_SORT.out.bam // channel: [ val(meta), [ bam ] ]
bai = SAMTOOLS_INDEX.out.bai // channel: [ val(meta), [ bai ] ]
csi = SAMTOOLS_INDEX.out.csi // channel: [ val(meta), [ csi ] ]

stats = BAM_STATS_SAMTOOLS.out.stats // channel: [ val(meta), [ stats ] ]
flagstat = BAM_STATS_SAMTOOLS.out.flagstat // channel: [ val(meta), [ flagstat ] ]
idxstats = BAM_STATS_SAMTOOLS.out.idxstats // channel: [ val(meta), [ idxstats ] ]

versions = ch_versions // channel: [ versions.yml ]
}
41 changes: 41 additions & 0 deletions subworkflows/nf-core/bam_sort_samtools/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: bam_sort_samtools
description: Sort SAM/BAM/CRAM file
keywords:
- sort
- bam
- sam
- cram
modules:
- samtools/sort
- samtools/index
- samtools/stats
- samtools/idxstats
- samtools/flagstat
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: BAM/CRAM/SAM file
pattern: '*.{bam,cram,sam}'
# TODO Update when we decide on a standard for subworkflow docs
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Sorted BAM/CRAM/SAM file
pattern: '*.{bam,cram,sam}'
- versions:
type: file
description: File containing software versions
pattern: 'versions.yml'
authors:
- '@drpatelh'
- '@ewels'
1 change: 1 addition & 0 deletions subworkflows/nf-core/bam_sort_samtools/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
params.options = [:]
33 changes: 33 additions & 0 deletions subworkflows/nf-core/bam_stats_samtools/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
//
// Run SAMtools stats, flagstat and idxstats
//

params.options = [:]

include { SAMTOOLS_STATS } from '../../../modules/samtools/stats/main' addParams( options: params.options )
include { SAMTOOLS_IDXSTATS } from '../../../modules/samtools/idxstats/main' addParams( options: params.options )
include { SAMTOOLS_FLAGSTAT } from '../../../modules/samtools/flagstat/main' addParams( options: params.options )

workflow BAM_STATS_SAMTOOLS {
take:
ch_bam_bai // channel: [ val(meta), [ bam ], [bai/csi] ]

main:
ch_versions = Channel.empty()

SAMTOOLS_STATS ( ch_bam_bai )
ch_versions = ch_versions.mix(SAMTOOLS_STATS.out.versions.first())

SAMTOOLS_FLAGSTAT ( ch_bam_bai )
ch_versions = ch_versions.mix(SAMTOOLS_FLAGSTAT.out.versions.first())

SAMTOOLS_IDXSTATS ( ch_bam_bai )
ch_versions = ch_versions.mix(SAMTOOLS_IDXSTATS.out.versions.first())

emit:
stats = SAMTOOLS_STATS.out.stats // channel: [ val(meta), [ stats ] ]
flagstat = SAMTOOLS_FLAGSTAT.out.flagstat // channel: [ val(meta), [ flagstat ] ]
idxstats = SAMTOOLS_IDXSTATS.out.idxstats // channel: [ val(meta), [ idxstats ] ]

versions = ch_versions // channel: [ versions.yml ]
}
43 changes: 43 additions & 0 deletions subworkflows/nf-core/bam_stats_samtools/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: samtools_stats
description: Produces comprehensive statistics from SAM/BAM/CRAM file
keywords:
- statistics
- counts
- bam
- sam
- cram
modules:
- samtools/stats
- samtools/idxstats
- samtools/flagstat
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: BAM/CRAM/SAM file
pattern: '*.{bam,cram,sam}'
- bai:
type: file
description: Index for BAM/CRAM/SAM file
pattern: '*.{bai,crai,sai}'
# TODO Update when we decide on a standard for subworkflow docs
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- stats:
type: file
description: File containing samtools stats output
pattern: '*.{stats}'
- versions:
type: file
description: File containing software versions
pattern: 'versions.yml'
authors:
- '@drpatelh'
1 change: 1 addition & 0 deletions subworkflows/nf-core/bam_stats_samtools/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
params.options = [:]
11 changes: 11 additions & 0 deletions tests/config/pytest_subworkflows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
subworkflows/align_bowtie2:
- subworkflows/nf-core/align_bowtie2/**
- tests/subworkflows/nf-core/align_bowtie2/**

subworkflows/bam_stats_samtools:
- subworkflows/nf-core/bam_stats_samtools/**
- tests/subworkflows/nf-core/bam_stats_samtools/**

subworkflows/bam_sort_samtools:
- subworkflows/nf-core/bam_sort_samtools/**
- tests/subworkflows/nf-core/bam_sort_samtools/**
2 changes: 1 addition & 1 deletion tests/modules/samtools/sort/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

nextflow.enable.dsl = 2

include { SAMTOOLS_SORT } from '../../../../modules/samtools/sort/main.nf' addParams( options: [:] )
include { SAMTOOLS_SORT } from '../../../../modules/samtools/sort/main.nf' addParams( options: ['suffix': '.sorted'] )

workflow test_samtools_sort {
input = [ [ id:'test', single_end:false ], // meta map
Expand Down
4 changes: 2 additions & 2 deletions tests/modules/samtools/sort/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
- samtools
- samtools/sort
files:
- path: output/samtools/test.bam
md5sum: bdc2d9e3f579f84df1e242207b627f89
- path: output/samtools/test.sorted.bam
md5sum: bbb2db225f140e69a4ac577f74ccc90f
27 changes: 27 additions & 0 deletions tests/subworkflows/nf-core/align_bowtie2/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { BOWTIE2_BUILD } from '../../../../modules/bowtie2/build/main.nf' addParams( options: [:] )
include { ALIGN_BOWTIE2 } from '../../../../subworkflows/nf-core/align_bowtie2/main.nf' addParams( 'samtools_sort_options': ['suffix': '.sorted'] )

workflow test_align_bowtie2_single_end {
input = [ [ id:'test', single_end:true ], // meta map
[ file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true) ]
]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)

BOWTIE2_BUILD ( fasta )
ALIGN_BOWTIE2 ( input, BOWTIE2_BUILD.out.index )
}

workflow test_align_bowtie2_paired_end {
input = [ [ id:'test', single_end:false ], // meta map
[ file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true),
file(params.test_data['sarscov2']['illumina']['test_2_fastq_gz'], checkIfExists: true) ]
]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)

BOWTIE2_BUILD ( fasta )
ALIGN_BOWTIE2 ( input, BOWTIE2_BUILD.out.index )
}
Loading

0 comments on commit c19671d

Please sign in to comment.