Skip to content

Commit

Permalink
feat: pool inputs (#214)
Browse files Browse the repository at this point in the history
* refactor: avoid clashing output for bwa mem processes

* feat: add cat module

* feat: pool input replicates

resolves #204

resolves #210

* refactor: create pool_inputs subworkflow
  • Loading branch information
kelly-sovacool authored Nov 26, 2024
1 parent d116319 commit d9ae4d2
Show file tree
Hide file tree
Showing 19 changed files with 564 additions and 45 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## CHAMPAGNE development version

- The CHAMPAGNE nextflow workflow now has a version entry in `nextflow.config`, in compliance with nf-core. (#213, @kelly-sovacool)
- Pool input (control) reads of the same sample name by default. Any inputs that should not be pooled must have different sample names in the samplesheet. (#214, @kelly-sovacool)

## CHAMPAGNE 0.4.0

Expand Down
24 changes: 12 additions & 12 deletions assets/samplesheet_human.csv
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
fastq_1,fastq_2,sample,rep,antibody,control
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129678.fastq.gz,,A549_CTCF,1,CTCF,A549_CTCF_INPUT_1
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129676.fastq.gz,,A549_CTCF,2,CTCF,A549_CTCF_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129677.fastq.gz,,A549_CTCF,3,CTCF,A549_CTCF_INPUT_3
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129560.fastq.gz,,A549_CTCF_INPUT_1,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129561.fastq.gz,,A549_CTCF_INPUT_2,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129562.fastq.gz,,A549_CTCF_INPUT_3,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636612.fastq.gz,,A549_JUN,1,JUN,A549_JUN_INPUT_1
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636613.fastq.gz,,A549_JUN,2,JUN,A549_JUN_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636614.fastq.gz,,A549_JUN,3,JUN,A549_JUN_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638304.fastq.gz,,A549_JUN_INPUT_1,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638305.fastq.gz,,A549_JUN_INPUT_2,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638306.fastq.gz,,A549_JUN_INPUT_3,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129678.fastq.gz,,A549_CTCF,1,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129676.fastq.gz,,A549_CTCF,2,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129677.fastq.gz,,A549_CTCF,3,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129560.fastq.gz,,A549_CTCF_INPUT,1,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129561.fastq.gz,,A549_CTCF_INPUT,2,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129562.fastq.gz,,A549_CTCF_INPUT,3,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636612.fastq.gz,,A549_JUN,1,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636613.fastq.gz,,A549_JUN,2,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636614.fastq.gz,,A549_JUN,3,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638304.fastq.gz,,A549_JUN_INPUT,1,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638305.fastq.gz,,A549_JUN_INPUT,2,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638306.fastq.gz,,A549_JUN_INPUT,3,,
12 changes: 6 additions & 6 deletions assets/samplesheet_test.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sample,rep,fastq_1,fastq_2,antibody,control
SPT5_T0,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_2.fastq.gz,SPT5,SPT5_INPUT_1
SPT5_T0,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822154_1.fastq.gz,,SPT5,SPT5_INPUT_2
SPT5_T15,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_2.fastq.gz,SPT5,SPT5_INPUT_1
SPT5_T15,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822158_1.fastq.gz,,SPT5,SPT5_INPUT_2
SPT5_INPUT_1,,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
SPT5_INPUT_2,,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,,,
SPT5_T0,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_2.fastq.gz,SPT5,SPT5_INPUT
SPT5_T0,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822154_1.fastq.gz,,SPT5,SPT5_INPUT
SPT5_T15,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_2.fastq.gz,SPT5,SPT5_INPUT
SPT5_T15,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822158_1.fastq.gz,,SPT5,SPT5_INPUT
SPT5_INPUT,1,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
SPT5_INPUT,2,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
15 changes: 10 additions & 5 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/env python3

"""
source: https://github.com/nf-core/chipseq/blob/51eba00b32885c4d0bec60db3cb0a45eb61e34c5/bin/check_samplesheet.py
adapted from: https://github.com/nf-core/chipseq/blob/51eba00b32885c4d0bec60db3cb0a45eb61e34c5/bin/check_samplesheet.py
"""

import collections
import os
import errno
import argparse
Expand Down Expand Up @@ -52,6 +52,7 @@ def check_samplesheet(file_in, file_out):
"""

sample_mapping_dict = {}
input_dict = collections.defaultdict(list)
with open(file_in, "r", encoding="utf-8-sig") as fin:
## Check header
MIN_COLS = 2
Expand Down Expand Up @@ -144,6 +145,7 @@ def check_samplesheet(file_in, file_out):
"Line",
line,
)
is_control_input = not antibody and not control

## Auto-detect paired-end/single-end
if not sample or not fastq_1:
Expand Down Expand Up @@ -172,7 +174,9 @@ def check_samplesheet(file_in, file_out):
print_error("Samplesheet contains duplicate rows!", "Line", line)
else:
sample_mapping_dict[sample].append(sample_info)
# pprint.pprint(sample_mapping_dict)
if is_control_input:
input_dict[sample_basename].append(sample_info)

## Write validated samplesheet with appropriate columns
if len(sample_mapping_dict) > 0:
out_dir = os.path.dirname(file_out)
Expand Down Expand Up @@ -205,11 +209,12 @@ def check_samplesheet(file_in, file_out):
sample,
)

# check that the control/input exists
for idx, val in enumerate(sample_mapping_dict[sample]):
control = val[-1]
if control and control not in sample_mapping_dict.keys():
if control and control not in input_dict.keys():
print_error(
f"Control identifier has to match a provided sample identifier!",
"Control identifier has to match a provided sample identifier!",
"Control",
control,
)
Expand Down
21 changes: 11 additions & 10 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ log.info """\
include { FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS as DOWNLOAD_FASTQ } from './subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools'
include { INPUT_CHECK } from './subworkflows/local/input_check.nf'
include { PREPARE_GENOME } from './subworkflows/local/prepare_genome.nf'
include { POOL_INPUTS } from './subworkflows/local/pool_inputs/'
include { FILTER_BLACKLIST } from './subworkflows/CCBR/filter_blacklist/'
include { ALIGN_GENOME } from "./subworkflows/local/align.nf"
include { DEDUPLICATE } from "./subworkflows/local/deduplicate.nf"
Expand Down Expand Up @@ -74,28 +75,27 @@ workflow CHIPSEQ {
INPUT_CHECK(file(params.input, checkIfExists: true), params.seq_center, contrast_sheet)

INPUT_CHECK.out.reads.set { raw_fastqs }
raw_fastqs | CUTADAPT
CUTADAPT.out.reads.set{ trimmed_fastqs }
CUTADAPT(raw_fastqs).reads | POOL_INPUTS
trimmed_fastqs = POOL_INPUTS.out.reads

PREPARE_GENOME()
chrom_sizes = PREPARE_GENOME.out.chrom_sizes

effective_genome_size = PREPARE_GENOME.out.effective_genome_size

FILTER_BLACKLIST(trimmed_fastqs, PREPARE_GENOME.out.blacklist_index)
ALIGN_GENOME(FILTER_BLACKLIST.out.reads, PREPARE_GENOME.out.reference_index)
ALIGN_GENOME.out.bam.set{ aligned_bam }
aligned_bam = ALIGN_GENOME.out.bam

DEDUPLICATE(aligned_bam, chrom_sizes, effective_genome_size)
DEDUPLICATE.out.bam.set{ deduped_bam }
DEDUPLICATE.out.tag_align.set{ deduped_tagalign }
deduped_bam = DEDUPLICATE.out.bam
deduped_tagalign = DEDUPLICATE.out.tag_align

deduped_bam | PHANTOM_PEAKS
PHANTOM_PEAKS.out.fraglen | PPQT_PROCESS
PPQT_PROCESS.out.fraglen.set { frag_lengths }
PHANTOM_PEAKS(deduped_bam).fraglen | PPQT_PROCESS
frag_lengths = PPQT_PROCESS.out.fraglen

ch_multiqc = Channel.of()
if (params.run.qc) {
QC(raw_fastqs, trimmed_fastqs, FILTER_BLACKLIST.out.n_surviving_reads,
QC(raw_fastqs, CUTADAPT.out.reads, FILTER_BLACKLIST.out.n_surviving_reads,
aligned_bam, ALIGN_GENOME.out.aligned_flagstat, ALIGN_GENOME.out.filtered_flagstat,
deduped_bam, DEDUPLICATE.out.flagstat,
PHANTOM_PEAKS.out.spp, frag_lengths,
Expand Down Expand Up @@ -157,6 +157,7 @@ workflow CHIPSEQ {
)

}

}

if (!workflow.stubRun) {
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"cat/cat": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"custom/sratoolsncbisettings": {
"branch": "master",
"git_sha": "20e78a9868eaa69c8cac91152397def32374b807",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/cat/cat/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

78 changes: 78 additions & 0 deletions modules/nf-core/cat/cat/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions modules/nf-core/cat/cat/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit d9ae4d2

Please sign in to comment.