Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_tracks.wdl should do the pre-processing for input to GISTIC2 #5283

Closed
LeeTL1220 opened this issue Oct 5, 2018 · 0 comments
Closed

Comments

@LeeTL1220
Copy link
Contributor

Feature request

Tool(s) or class(es) involved

combine_tracks.wdl

Description

In order for outputs from GATK CNV to be usable by GISTIC2, we need to have a conversion step. Here is mostly un-tested WDL that should work:

#UNSUPPORTED -- simple conversion of a merged & pruned seg file (from the CNV postprocessing workflow) to the GISTIC2 format.
# No column headers printed.  Each column is:
#
#(1)  Sample           (sample name)
#(2)  Chromosome  (chromosome number)
#(3)  Start Position  (segment start position, in bases)
#(4)  End Position   (segment end position, in bases)
#(5)  Num markers      (number of markers in segment)
#(6)  Seg.CN       (log2() -1 of copy number)
#
# This has barely been tested
#
workflow ConvertMergedPrunedSegsToGistic2 {
    File cnv_postprocessing_tumor_with_tracks_pruned_merged_seg
    String docker
    call Gistic2Convert {
        input:
            input_file = cnv_postprocessing_tumor_with_tracks_pruned_merged_seg,
            docker = docker
    }

    output {
        File cnv_postprocessing_tumor_with_tracks_pruned_merged_seg_gistic2 = Gistic2Convert.output_file_gistic2
    }
}

task Gistic2Convert {
    File input_file
    String docker
    String output_file = basename(input_file) + ".gistic2.seg"

    command <<<
        set -e
        python <<EOF
import csv
input_file = "${input_file}"
output_file = "${output_file}"

"""
  The column headers are:

(1)  Sample           (sample name)
(2)  Chromosome  (chromosome number)
(3)  Start Position  (segment start position, in bases)
(4)  End Position   (segment end position, in bases)
(5)  Num markers      (number of markers in segment)
(6)  Seg.CN       (log2() -1 of copy number)
"""

if __name__ == "__main__":
    with open(input_file, 'rb') as tsvinfp, open(output_file, 'wb') as tsvoutfp:
        tsvin = csv.DictReader(tsvinfp, delimiter='\t')
        tsvout = csv.writer(tsvoutfp, delimiter="\t")
        for r in tsvin:
            int_ify_num_points = r["NUM_POINTS_COPY_RATIO"].replace(".0", "")
            outrow = [r["SAMPLE"], r["Chromosome"], r["Start"], r["End"], int_ify_num_points, r["MEAN_LOG2_COPY_RATIO"]]
            print(outrow)
            tsvout.writerow(outrow)

EOF
    >>>

    runtime {
        docker: docker
        memory: "2000 MB"
        disks: "local-disk 100 HDD"
        preemptible: 3
        cpu: 1
    }
    output {
        File output_file_gistic2 = "${output_file}"
    }
}

Sorry about the heredoc python

LeeTL1220 added a commit that referenced this issue Oct 9, 2018
There are no Java code changes in this PR.  Tests were done manually.  As a reminder, the modified files are still considered experimental.

Changes:
- combine_tracks.wdl:  Fixes bug where string was compared to a float.  Closes #5284 
- combine_tracks.wdl:  Converts the processed seg file into a format for GISTIC2.  This is a trivial conversion.  Closes #5283 
- Other changes in `aggregate_combine_tracks.wdl` to support the above, including aggregation of individual GISTIC2 seg files into a single GISTIC2 seg file.
- Added gs urls for necessary auxiliary files in the documentation.
- Added multiple output types for the ABSOLUTE skew parameter to support heterogeneous execution configurations.  File, Float, and String.  All are the same value.
EdwardDixon pushed a commit to EdwardDixon/gatk that referenced this issue Nov 9, 2018
…titute#5287)

There are no Java code changes in this PR.  Tests were done manually.  As a reminder, the modified files are still considered experimental.

Changes:
- combine_tracks.wdl:  Fixes bug where string was compared to a float.  Closes broadinstitute#5284 
- combine_tracks.wdl:  Converts the processed seg file into a format for GISTIC2.  This is a trivial conversion.  Closes broadinstitute#5283 
- Other changes in `aggregate_combine_tracks.wdl` to support the above, including aggregation of individual GISTIC2 seg files into a single GISTIC2 seg file.
- Added gs urls for necessary auxiliary files in the documentation.
- Added multiple output types for the ABSOLUTE skew parameter to support heterogeneous execution configurations.  File, Float, and String.  All are the same value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant