Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding FuncotateSegments as an option to the Somatic CNV WDL. #5967

Merged
merged 3 commits into from
Jun 21, 2019

Conversation

LeeTL1220
Copy link
Contributor

Adding FuncotateSegments as an option to the Somatic CNV WDL.

  • Added automated WDL test to find obvious failures.

@LeeTL1220
Copy link
Contributor Author

@mwalker174 You should review the docs, etc. As well as the WDL itself. And also, FC deployment will need to know about this before the next release, since the imports changed.

@codecov
Copy link

codecov bot commented May 24, 2019

Codecov Report

Merging #5967 into master will decrease coverage by 1.084%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5967       +/-   ##
===============================================
- Coverage     86.929%   85.845%   -1.084%     
+ Complexity     32714     32489      -225     
===============================================
  Files           2013      2016        +3     
  Lines         151268    151460      +192     
  Branches       16604     16628       +24     
===============================================
- Hits          131496    130021     -1475     
- Misses         13718     15435     +1717     
+ Partials        6054      6004       -50
Impacted Files Coverage Δ Complexity Δ
...s/GermlineContigPloidyModelArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-4%)
...mlineContigPloidyHybridADVIArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-3%)
...ments/GermlineCNVHybridADVIArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-3%)
...r/arguments/GermlineCallingArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-4%)
...tools/examples/ExampleStreamingPythonExecutor.java 0% <0%> (-96.667%) 0% <0%> (-8%)
...ents/GermlineDenoisingModelArgumentCollection.java 0% <0%> (-96.552%) 0% <0%> (-4%)
...number/arguments/HybridADVIArgumentCollection.java 0% <0%> (-94.915%) 0% <0%> (-3%)
...der/utils/python/PythonScriptExecutorUnitTest.java 3.846% <0%> (-94.231%) 1% <0%> (-11%)
...der/tools/walkers/vqsr/CNNVariantPipelineTest.java 5.882% <0%> (-94.118%) 2% <0%> (-6%)
.../tools/copynumber/PostprocessGermlineCNVCalls.java 0% <0%> (-93.023%) 0% <0%> (-26%)
... and 45 more

@LeeTL1220
Copy link
Contributor Author

@jonn-smith @mwalker174 Could one or both of you review, please?

@@ -0,0 +1,151 @@
workflow CNVFuncotateSegmentsWorkflow {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have some comments at the top to help with input arguments / valid types ala the Funcotator WDL:

# Run Funcotator on a set of called variants from Mutect 2.
#
# Description of inputs:
#
#   Required:
#     String gatk_docker                  - GATK Docker image in which to run
#     File ref_fasta                      - Reference FASTA file.
#     File ref_fasta_index                - Reference FASTA file index.
#     File ref_fasta_dict                 - Reference FASTA file sequence dictionary.
#     File variant_vcf_to_funcotate       - Variant Context File (VCF) containing the variants to annotate.
#     File variant_vcf_to_funcotate_index - Index file corresponding to the input Variant Context File (VCF) containing the variants to annotate.
#     String reference_version            - Version of the reference being used.  Either `hg19` or `hg38`.
#     String output_file_name             - Path to desired output file.
#     String output_format                - Output file format (either VCF or MAF).
#     Boolean compress				      - Whether to compress the resulting output file.
#     Boolean use_gnomad                  - If true, will enable the gnomAD data sources in the data source tar.gz, if they exist.
#
#   Optional:
#     interval_list                       - Intervals to be used for traversal.  If specified will only traverse the given intervals.
#     data_sources_tar_gz                 - Path to tar.gz containing the data sources for Funcotator to create annotations.
#     transcript_selection_mode           - Method of detailed transcript selection.  This will select the transcript for detailed annotation (either `CANONICAL` or `BEST_EFFECT`).
#     transcript_selection_list           - Set of transcript IDs to use for annotation to override selected transcript.
#     annotation_defaults                 - Annotations to include in all annotated variants if the annotation is not specified in the data sources (in the format <ANNOTATION>:<VALUE>).  This will add the specified annotation to every annotated variant if it is not already present.
#     annotation_overrides                - Override values for annotations (in the format <ANNOTATION>:<VALUE>).  Replaces existing annotations of the given name with given values.
#     gatk4_jar_override                  - Override Jar file containing GATK 4.0.  Use this when overriding the docker JAR or when using a backend without docker.
#     funcotator_extra_args               - Extra command-line arguments to pass through to Funcotator.  (e.g. " --exclude-field foo_field --exclude-field bar_field ")
#
# This WDL needs to decide whether to use the ``gatk_jar`` or ``gatk_jar_override`` for the jar location.  As of cromwell-0.24,
# this logic *must* go into each task.  Therefore, there is a lot of duplicated code.  This allows users to specify a jar file
# independent of what is in the docker file.  See the README.md for more info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

A couple of questions / requests.



echo "Removing $DATA_SOURCES_FOLDER"
${removing_untared_datasources}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother doing this? My understanding is that you pull the image every time, so this won't persist. Is that not true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are running on-prem compute or local, you will have un-tar'ed datasources throughout your output directories. These will not get removed unless you are on the cloud.

I am open to suggestions for a better solution. Without WDL Directory support, I only know of sub-optimal choices.

No action.

${extra_args_arg}


echo "Removing $DATA_SOURCES_FOLDER"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this echo and add it to the definition of removing_untared_datasources to make it less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (select_first([is_run_funcotator, false])) {
call CNVFuncotateSegments.CNVFuncotateSegmentsWorkflow as CNVFuncotateSegmentsWorkflow {
input:
input_seg_file = CallCopyRatioSegmentsTumor.called_copy_ratio_segments,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to expose any more input arguments here as possibilities now? It seems like it may be useful to add them all up-front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

File funcotated_seg_simple_tsv = "${basename_input_seg_file}.funcotated.tsv"
File funcotated_gene_list_tsv = "${basename_input_seg_file}.funcotated.tsv.gene_list.txt"
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

java -jar ${CROMWELL_JAR} run /home/travis/build/broadinstitute/gatk/scripts/cnv_wdl/somatic/cnv_somatic_pair_workflow.wdl \
-i cnv_somatic_pair_wes_do-gc_workflow_funcotator_mod.json
#####
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#####
#####

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"CNVSomaticPairWorkflow.normal_bam_idx": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/SM-74NEG-v1-chr20-downsampled.deduplicated.cram.crai",
"CNVSomaticPairWorkflow.read_count_pon": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/wes-do-gc.pon.hdf5",
"CNVSomaticPairWorkflow.ref_fasta_dict": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/human_g1k_v37.chr-20.truncated.dict",
"CNVSomaticPairWorkflow.ref_fasta_fai": "/home/travis/build/broadinstitute/gatk/src/test/resources/large/cnv_somatic_workflows_test_files/human_g1k_v37.chr-20.truncated.fasta.fai",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an observation, this is an odd reference to use for our data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just building off of existing infrastructure. We want this to run fast....

No action.

@@ -0,0 +1,15 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one test JSON enough? It doesn't have all the arguments in it that are supported, so I'm wondering if it makes sense to explicitly put them in (or have multiple test cases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally blocked by an inability to easily examine the outputs of a given cromwell run. This is captured in #6013

No action.

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, just a few minor comments.

@@ -432,6 +451,23 @@ workflow CNVSomaticPairWorkflow {
preemptible_attempts = preemptible_attempts
}
}
if (select_first([is_run_funcotator, false])) {
call CNVFuncotateSegments.CNVFuncotateSegmentsWorkflow as CNVFuncotateSegmentsWorkflow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add CNVFuncotateSegmentsWorkflow.funcotated_seg_simple_tsv and CNVFuncotateSegmentsWorkflow.funcotated_gene_list_tsv to the workflow outputs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seem to be missing CNVOncotatorWorkflow outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something? Below is from the output block in the cnv_somatic_pair_workflow

        File oncotated_called_file_tumor = select_first([CNVOncotatorWorkflow.oncotated_called_file, "null"])
        File oncotated_called_gene_list_file_tumor = select_first([CNVOncotatorWorkflow.oncotated_called_gene_list_file, "null"])

For Funcotator: Done.

String funcotator_ref_version
File? gatk4_jar_override
File? funcotator_data_sources_tar_gz
String? transcript_selection_mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a comment listing valid values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in preamble as per @jonn-smith comment. No further action.

--ref-version ${funcotator_ref_version} \
--output-file-format SEG \
-R ${ref_fasta} \
--segments ${input_seg_file} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@LeeTL1220
Copy link
Contributor Author

@mwalker174 @jonn-smith Back to you guys.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

@LeeTL1220 LeeTL1220 merged commit f7d6876 into master Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants