Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating SimpleGermlineTagger and somatic CNV experimental post-processing workflow #5252

Merged
merged 8 commits into from
Oct 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions scripts/cnv_wdl/germline/cnv_germline_case_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
#
# - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the
# GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists).
# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250)
# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning,
# These intervals will be padded on both sides by the amount specified by padding (default 250)
# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning,
# e.g., for WES). For WGS, the intervals should simply cover the chromosomes of interest.
#
# - Intervals can be blacklisted from coverage collection and all downstream steps by using the blacklist_intervals
Expand Down
4 changes: 2 additions & 2 deletions scripts/cnv_wdl/germline/cnv_germline_cohort_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
#
# - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the
# GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists).
# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250)
# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning,
# These intervals will be padded on both sides by the amount specified by padding (default 250)
# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning,
# e.g., for WES). For WGS, the intervals should simply cover the chromosomes of interest.
#
# - Intervals can be blacklisted from coverage collection and all downstream steps by using the blacklist_intervals
Expand Down
4 changes: 2 additions & 2 deletions scripts/cnv_wdl/somatic/cnv_somatic_pair_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
#
# - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the
# GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists).
# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250)
# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning,
# These intervals will be padded on both sides by the amount specified by padding (default 250)
# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Can you do a find-and-replace on other instances? Should be a few more in the other somatic/germline WDLs as well as the docs. Also may be some instances of PreprocessIntervals.padding.

BTW, where are we with Cromwell exposing task-level parameters automatically for subworkflows? Do we still need to expose all parameters in this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and done for padding as well.

Your comment here is related to broadinstitute/cromwell#2912 ... I have not heard about any update on this.

# e.g., for WES). For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be
# included, but care should be taken to 1) avoid creating panels of mixed sex, and 2) denoise case samples only
# with panels containing only individuals of the same sex as the case samples).
Expand Down
4 changes: 2 additions & 2 deletions scripts/cnv_wdl/somatic/cnv_somatic_panel_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
#
# - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the
# GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists).
# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250)
# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning,
# These intervals will be padded on both sides by the amount specified by padding (default 250)
# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning,
# e.g., for WES). For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be
# included, but care should be taken to 1) avoid creating panels of mixed sex, and 2) denoise case samples only
# with panels containing only individuals of the same sex as the case samples).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Unsupported workflow that concatenates the IGV compatible files generated by multiple runs of combine_tracks.wdl
workflow AggregateCombinedTracksWorkflow {
String group_id
Array[File] tumor_with_germline_filtered_segs
Array[File] normals_igv_compat
Array[File] tumors_igv_compat

call TsvCat as TsvCatTumorGermlinePruned {
input:
input_files = tumor_with_germline_filtered_segs,
id = group_id + "_TumorGermlinePruned"
}

call TsvCat as TsvCatTumor {
input:
input_files = tumors_igv_compat,
id = group_id + "_Tumor"
}

call TsvCat as TsvCatNormal {
input:
input_files = normals_igv_compat,
id = group_id + "_Normal"
}

output {
File cnv_postprocessing_aggregated_tumors_pre = TsvCatTumor.aggregated_tsv
File cnv_postprocessing_aggregated_tumors_post = TsvCatTumorGermlinePruned.aggregated_tsv
File cnv_postprocessing_aggregated_normals = TsvCatNormal.aggregated_tsv
}
}


task TsvCat {

String id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some white space funkiness here and throughout the other WDLs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it here. Will be on lookout in other files.

Array[File] input_files

command <<<
set -e

head -1 ${input_files[0]} > ${id}.aggregated.seg

for FILE in ${sep=" " input_files}
do
egrep -v "CONTIG|Chromosome" $FILE >> ${id}.aggregated.seg
done
>>>

output {
File aggregated_tsv="${id}.aggregated.seg"
}

runtime {
docker: "ubuntu:16.04"
memory: "2 GB"
cpu: "1"
disks: "local-disk 100 HDD"
}
}
Loading