-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete lib directory and replace with utils_* subworkflows #1197
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I LOOOOOVE IT
Just please not that there is still an issue with the monochrome_logs params:
|
Yep, I think this is actually coming from the nf-validation plugin because I see the same issue with nf-core/fetchngs. Can you create an issue there please? I started trying to look at the nf-validation code base to find the issue but got distracted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future thoughts for as we update this.
workflow PREPARE_GENOME { | ||
take: | ||
fasta // file: /path/to/genome.fasta | ||
gtf // file: /path/to/genome.gtf | ||
gff // file: /path/to/genome.gff | ||
additional_fasta // file: /path/to/additional.fasta | ||
transcript_fasta // file: /path/to/transcript.fasta | ||
gene_bed // file: /path/to/gene.bed | ||
splicesites // file: /path/to/splicesites.txt | ||
bbsplit_fasta_list // file: /path/to/bbsplit_fasta_list.txt | ||
star_index // directory: /path/to/star/index/ | ||
rsem_index // directory: /path/to/rsem/index/ | ||
salmon_index // directory: /path/to/salmon/index/ | ||
kallisto_index // directory: /path/to/kallisto/index/ | ||
hisat2_index // directory: /path/to/hisat2/index/ | ||
bbsplit_index // directory: /path/to/rsem/index/ | ||
gencode // boolean: whether the genome is from GENCODE | ||
is_aws_igenome // boolean: whether the genome files are from AWS iGenomes | ||
biotype // string: if additional fasta file is provided biotype value to use when appending entries to GTF file | ||
prepare_tool_indices // list: tools to prepare indices for | ||
filter_gtf // boolean: whether to filter GTF file | ||
fasta // file: /path/to/genome.fasta | ||
gtf // file: /path/to/genome.gtf | ||
gff // file: /path/to/genome.gff | ||
additional_fasta // file: /path/to/additional.fasta | ||
transcript_fasta // file: /path/to/transcript.fasta | ||
gene_bed // file: /path/to/gene.bed | ||
splicesites // file: /path/to/splicesites.txt | ||
bbsplit_fasta_list // file: /path/to/bbsplit_fasta_list.txt | ||
star_index // directory: /path/to/star/index/ | ||
rsem_index // directory: /path/to/rsem/index/ | ||
salmon_index // directory: /path/to/salmon/index/ | ||
kallisto_index // directory: /path/to/kallisto/index/ | ||
hisat2_index // directory: /path/to/hisat2/index/ | ||
bbsplit_index // directory: /path/to/rsem/index/ | ||
gencode // boolean: whether the genome is from GENCODE | ||
featurecounts_group_type // string: The attribute type used to group feature types in the GTF file when generating the biotype plot with featureCounts | ||
aligner // string: Specifies the alignment algorithm to use - available options are 'star_salmon', 'star_rsem' and 'hisat2' | ||
pseudo_aligner // string: Specifies the pseudo aligner to use - available options are 'salmon'. Runs in addition to '--aligner' | ||
skip_gtf_filter // boolean: Skip filtering of GTF for valid scaffolds and/ or transcript IDs | ||
skip_bbsplit // boolean: Skip BBSplit for removal of non-reference genome reads | ||
skip_alignment // boolean: Skip all of the alignment-based processes within the pipeline | ||
skip_pseudo_alignment // boolean: Skip all of the pseudoalignment-based processes within the pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I think this subworkflow is doing waaayyyyy too much.
// Determine whether to filter the GTF or not | ||
def filter_gtf = | ||
(( | ||
// Condition 1: Alignment is required and aligner is set | ||
!skip_alignment && aligner | ||
) || | ||
( | ||
// Condition 2: Pseudoalignment is required and pseudoaligner is set | ||
!skip_pseudo_alignment && pseudo_aligner | ||
) || | ||
( | ||
// Condition 3: Transcript FASTA file is not provided | ||
!transcript_fasta | ||
)) && | ||
( | ||
// Condition 4: --skip_gtf_filter is not provided | ||
!skip_gtf_filter | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way too much stuff going on here. We should be able to simplify this by clarifying what filter_gtf
means. What are we filtering from the GTF? Why?
// | ||
// Print version and exit if required and dump pipeline parameters to JSON file | ||
// | ||
UTILS_NEXTFLOW_PIPELINE ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not happy with the name UTILS_NEXTFLOW_PIPELINE
, NEXTFLOW_PIPELINE_UTILITIES
makes more sense but it still doesn't say what the workflow actually does, making it hard to read.
If we didn't know what the subworkflow did, what would we think this was doing?
def pre_help_text = nfCoreLogo(params.monochrome_logs) | ||
def post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(params.monochrome_logs) | ||
def String workflow_command = "nextflow run ${workflow.manifest.name} -profile <docker/singularity/.../institute> --input samplesheet.csv --genome GRCh37 --outdir <OUTDIR>" | ||
UTILS_NFVALIDATION_PLUGIN ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here.
======================================================================================== | ||
*/ | ||
|
||
workflow PIPELINE_INITIALISATION { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is slightly better, but a function name is a verb so saying what it is doing is more clear for the developer.
// | ||
// Custom validation for pipeline parameters | ||
// | ||
validateInputParameters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this and without looking at the code knew exactly what it was doing. A+.
lib/
directory and replace withutils_*
subworkflows installed from nf-core/modulesPREPARE_GENOME
subworkflow to run outside of main rnaseq workflowCUSTOM_DUMPSOFTWAREVERSIONS
to use native MultiQC software version reportingtest_cache
profile as this will be superseded by using test data from S3 whilst porting to nf-test