-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding FuncotateSegments
as an option to the Somatic CNV WDL.
#5967
Conversation
@mwalker174 You should review the docs, etc. As well as the WDL itself. And also, FC deployment will need to know about this before the next release, since the imports changed. |
Codecov Report
@@ Coverage Diff @@
## master #5967 +/- ##
===============================================
- Coverage 86.929% 85.845% -1.084%
+ Complexity 32714 32489 -225
===============================================
Files 2013 2016 +3
Lines 151268 151460 +192
Branches 16604 16628 +24
===============================================
- Hits 131496 130021 -1475
- Misses 13718 15435 +1717
+ Partials 6054 6004 -50
|
@jonn-smith @mwalker174 Could one or both of you review, please? |
@@ -0,0 +1,151 @@ | |||
workflow CNVFuncotateSegmentsWorkflow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have some comments at the top to help with input arguments / valid types ala the Funcotator WDL:
# Run Funcotator on a set of called variants from Mutect 2.
#
# Description of inputs:
#
# Required:
# String gatk_docker - GATK Docker image in which to run
# File ref_fasta - Reference FASTA file.
# File ref_fasta_index - Reference FASTA file index.
# File ref_fasta_dict - Reference FASTA file sequence dictionary.
# File variant_vcf_to_funcotate - Variant Context File (VCF) containing the variants to annotate.
# File variant_vcf_to_funcotate_index - Index file corresponding to the input Variant Context File (VCF) containing the variants to annotate.
# String reference_version - Version of the reference being used. Either `hg19` or `hg38`.
# String output_file_name - Path to desired output file.
# String output_format - Output file format (either VCF or MAF).
# Boolean compress - Whether to compress the resulting output file.
# Boolean use_gnomad - If true, will enable the gnomAD data sources in the data source tar.gz, if they exist.
#
# Optional:
# interval_list - Intervals to be used for traversal. If specified will only traverse the given intervals.
# data_sources_tar_gz - Path to tar.gz containing the data sources for Funcotator to create annotations.
# transcript_selection_mode - Method of detailed transcript selection. This will select the transcript for detailed annotation (either `CANONICAL` or `BEST_EFFECT`).
# transcript_selection_list - Set of transcript IDs to use for annotation to override selected transcript.
# annotation_defaults - Annotations to include in all annotated variants if the annotation is not specified in the data sources (in the format <ANNOTATION>:<VALUE>). This will add the specified annotation to every annotated variant if it is not already present.
# annotation_overrides - Override values for annotations (in the format <ANNOTATION>:<VALUE>). Replaces existing annotations of the given name with given values.
# gatk4_jar_override - Override Jar file containing GATK 4.0. Use this when overriding the docker JAR or when using a backend without docker.
# funcotator_extra_args - Extra command-line arguments to pass through to Funcotator. (e.g. " --exclude-field foo_field --exclude-field bar_field ")
#
# This WDL needs to decide whether to use the ``gatk_jar`` or ``gatk_jar_override`` for the jar location. As of cromwell-0.24,
# this logic *must* go into each task. Therefore, there is a lot of duplicated code. This allows users to specify a jar file
# independent of what is in the docker file. See the README.md for more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
A couple of questions / requests.
|
||
|
||
echo "Removing $DATA_SOURCES_FOLDER" | ||
${removing_untared_datasources} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother doing this? My understanding is that you pull the image every time, so this won't persist. Is that not true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are running on-prem compute or local, you will have un-tar'ed datasources throughout your output directories. These will not get removed unless you are on the cloud.
I am open to suggestions for a better solution. Without WDL Directory support, I only know of sub-optimal choices.
No action.
${extra_args_arg} | ||
|
||
|
||
echo "Removing $DATA_SOURCES_FOLDER" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this echo and add it to the definition of removing_untared_datasources
to make it less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (select_first([is_run_funcotator, false])) { | ||
call CNVFuncotateSegments.CNVFuncotateSegmentsWorkflow as CNVFuncotateSegmentsWorkflow { | ||
input: | ||
input_seg_file = CallCopyRatioSegmentsTumor.called_copy_ratio_segments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to expose any more input arguments here as possibilities now? It seems like it may be useful to add them all up-front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
File funcotated_seg_simple_tsv = "${basename_input_seg_file}.funcotated.tsv" | ||
File funcotated_gene_list_tsv = "${basename_input_seg_file}.funcotated.tsv.gene_list.txt" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java -jar ${CROMWELL_JAR} run /home/travis/build/broadinstitute/gatk/scripts/cnv_wdl/somatic/cnv_somatic_pair_workflow.wdl \ | ||
-i cnv_somatic_pair_wes_do-gc_workflow_funcotator_mod.json | ||
##### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
##### | |
##### | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"CNVSomaticPairWorkflow.normal_bam_idx": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/SM-74NEG-v1-chr20-downsampled.deduplicated.cram.crai", | ||
"CNVSomaticPairWorkflow.read_count_pon": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/wes-do-gc.pon.hdf5", | ||
"CNVSomaticPairWorkflow.ref_fasta_dict": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/human_g1k_v37.chr-20.truncated.dict", | ||
"CNVSomaticPairWorkflow.ref_fasta_fai": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/human_g1k_v37.chr-20.truncated.fasta.fai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an observation, this is an odd reference to use for our data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just building off of existing infrastructure. We want this to run fast....
No action.
@@ -0,0 +1,15 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is one test JSON enough? It doesn't have all the arguments in it that are supported, so I'm wondering if it makes sense to explicitly put them in (or have multiple test cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally blocked by an inability to easily examine the outputs of a given cromwell run. This is captured in #6013
No action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid, just a few minor comments.
@@ -432,6 +451,23 @@ workflow CNVSomaticPairWorkflow { | |||
preemptible_attempts = preemptible_attempts | |||
} | |||
} | |||
if (select_first([is_run_funcotator, false])) { | |||
call CNVFuncotateSegments.CNVFuncotateSegmentsWorkflow as CNVFuncotateSegmentsWorkflow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add CNVFuncotateSegmentsWorkflow.funcotated_seg_simple_tsv
and CNVFuncotateSegmentsWorkflow.funcotated_gene_list_tsv
to the workflow outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seem to be missing CNVOncotatorWorkflow
outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something? Below is from the output block in the cnv_somatic_pair_workflow
File oncotated_called_file_tumor = select_first([CNVOncotatorWorkflow.oncotated_called_file, "null"])
File oncotated_called_gene_list_file_tumor = select_first([CNVOncotatorWorkflow.oncotated_called_gene_list_file, "null"])
For Funcotator: Done.
String funcotator_ref_version | ||
File? gatk4_jar_override | ||
File? funcotator_data_sources_tar_gz | ||
String? transcript_selection_mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a comment listing valid values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in preamble as per @jonn-smith comment. No further action.
--ref-version ${funcotator_ref_version} \ | ||
--output-file-format SEG \ | ||
-R ${ref_fasta} \ | ||
--segments ${input_seg_file} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@mwalker174 @jonn-smith Back to you guys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
Adding
FuncotateSegments
as an option to the Somatic CNV WDL.