-
Notifications
You must be signed in to change notification settings - Fork 594
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating SimpleGermlineTagger and somatic CNV experimental post-proce…
…ssing workflow (#5252) Several experimental changes that improve precision results, and expand possible evaluations, of GATK CNV: - `combine_tracks.wdl` for post-processing somatic CNV calls. This wdl will perform two operations: - Increase precision by removing: - germline segments. As a result, the WDL requires the matched normal segments. - Areas of common germline activity or error from other cancer studies. - Convert the tumor model seg file to the same format as AllelicCapSeg, which can be read by ABSOLUTE. This is currently done inline in the WDL. - This is not a trivial conversion, since each segment must be called whether it is balanced or not (MAF =? 0.5). The current algorithm relies on hard filtering and may need updating pending evaluation. - For more information about AllelicCapSeg and ABSOLUTE, see: - Carter et al. *Absolute quantification of somatic DNA alterations in human cancer*, Nat Biotechnol. 2012 May; 30(5): 413–421 - https://software.broadinstitute.org/cancer/cga/absolute - Brastianos, P.K., Carter S.L., et al. *Genomic Characterization of Brain Metastases Reveals Branched Evolution and Potential Therapeutic Targets* (2015) Cancer Discovery PMID:26410082 - Changes to GATK tools to support the above: - `SimpleGermlineTagger` now uses reciprocal overlap to in addition to breakpoint matching when determining a possible germline event. This greatly improved results in areas near centromeres. - Added tool `MergeAnnotatedRegionsByAnnotation`. This simple tool will merge genomic regions (specified in a tsv) when given annotations (columns) contain exact values in neighboring segments and the segments are within a specified maximum genomic distance. - `multi_combine_tracks.wdl` and `aggregate_combine_tracks.wdl` which run `combine_tracks.wdl` on multiple pairs and combine the results into one seg file for easy consumption by IGV.
- Loading branch information
Showing
22 changed files
with
1,500 additions
and
185 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
60 changes: 60 additions & 0 deletions
60
scripts/unsupported/combine_tracks_postprocessing_cnv/aggregate_combined_tracks.wdl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Unsupported workflow that concatenates the IGV compatible files generated by multiple runs of combine_tracks.wdl | ||
workflow AggregateCombinedTracksWorkflow { | ||
String group_id | ||
Array[File] tumor_with_germline_filtered_segs | ||
Array[File] normals_igv_compat | ||
Array[File] tumors_igv_compat | ||
|
||
call TsvCat as TsvCatTumorGermlinePruned { | ||
input: | ||
input_files = tumor_with_germline_filtered_segs, | ||
id = group_id + "_TumorGermlinePruned" | ||
} | ||
|
||
call TsvCat as TsvCatTumor { | ||
input: | ||
input_files = tumors_igv_compat, | ||
id = group_id + "_Tumor" | ||
} | ||
|
||
call TsvCat as TsvCatNormal { | ||
input: | ||
input_files = normals_igv_compat, | ||
id = group_id + "_Normal" | ||
} | ||
|
||
output { | ||
File cnv_postprocessing_aggregated_tumors_pre = TsvCatTumor.aggregated_tsv | ||
File cnv_postprocessing_aggregated_tumors_post = TsvCatTumorGermlinePruned.aggregated_tsv | ||
File cnv_postprocessing_aggregated_normals = TsvCatNormal.aggregated_tsv | ||
} | ||
} | ||
|
||
|
||
task TsvCat { | ||
|
||
String id | ||
Array[File] input_files | ||
|
||
command <<< | ||
set -e | ||
|
||
head -1 ${input_files[0]} > ${id}.aggregated.seg | ||
|
||
for FILE in ${sep=" " input_files} | ||
do | ||
egrep -v "CONTIG|Chromosome" $FILE >> ${id}.aggregated.seg | ||
done | ||
>>> | ||
|
||
output { | ||
File aggregated_tsv="${id}.aggregated.seg" | ||
} | ||
|
||
runtime { | ||
docker: "ubuntu:16.04" | ||
memory: "2 GB" | ||
cpu: "1" | ||
disks: "local-disk 100 HDD" | ||
} | ||
} |
Oops, something went wrong.