gCNV WDLs should clean up intermediate files and directories. #5382

vruano · 2018-10-31T16:56:51Z

Due to the way the calling process is sharded upstream ... the current wdl expends 80-90% of the time copying files over ... so a job that takes around 1h only 10 minutes are expend in running the GATK Tool, the result is used to stage the input files.

For example if the genome intervals where shared into 600 ~ different batches this results in 3600 ~ files transferred one by one using their own gsutil command. The reason why these are not batched is in part because files share name and multi-file gsutil cp does not provide the means to indicate indepedendent destination names for each input file. Recursive copy of a parent directory would drag the information from all the samples when the each task just deals with one.

samuelklee · 2018-11-01T14:06:03Z

Yes, also recall that @asmirnov239 tried to address #4397, but that led to #5217, so we reverted. We can try to address all of these issues again correctly if it's low-hanging fruit (which it probably is) and if it'll bring the overall cost of the pipeline down significantly. However, for the most part, I think bringing down costs in the gCNV step will have more impact.

Thanks for diagnosing and pointing out these issues. You should feel free to open PRs against the gCNV code as well!

samuelklee · 2018-11-07T18:14:31Z

@vruano As pointed out by @sooheelee, the current WDL does not clean up the intermediate CALLS_* and MODEL_* directories. This is fine for running on the cloud, but we should clean them up when running locally. Can you take care of this as well?

samuelklee · 2019-01-31T15:40:50Z

Dupe of #4397, but changing the name to reflect the issue mentioned in the previous comment.

samuelklee · 2019-02-15T13:50:12Z

CALLS_* and MODEL_* directories are actually cleaned up in #5414, but there are a few places where contig-ploidy calls are not cleaned up.

We could also clean up the out directories generated by DetermineGermlineContigPloidy and GermlineCNVCaller, since the contents of these are sliced and tarred, but it's arguably nice to have all of the output for each shard in a single directory.

…os. (#5382)

* Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typos. (#5382) * Added output of MAD values as floats in somatic CNV WDL. (#5591) * Exposed boot disk space for Oncotator in somatic CNV WDL. (#3566) * Added check to skip outlier truncation if number of matrix elements exceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734) * Miscellaneous boy scout activities. * Fixed some issues concerning intervals in DetermineGermlineContigPloidy documentation. * Fixed non-kebab-case argument in CollectAllelicCountsSpark and other minor issues. * Improved consistency of style and input/output validation across CNV tools. (#4825)

vruano added Germline CNV wdl enhancement performance labels Oct 31, 2018

vruano assigned asmirnov239 Oct 31, 2018

samuelklee added the Copy Number tools label Jan 31, 2019

samuelklee changed the title ~~PostprocessGermlineCNVCalls (the enclosing wdl): too slow in Firecloud/cromwell.~~ gCNV WDLs should clean up intermediate files and directories. Jan 31, 2019

samuelklee removed enhancement performance labels Jan 31, 2019

samuelklee assigned samuelklee and unassigned asmirnov239 Feb 1, 2019

samuelklee added a commit that referenced this issue Feb 15, 2019

Cleaned up intermediate files in gCNV WDL. (#5382)

eaaf66e

samuelklee added a commit that referenced this issue Feb 15, 2019

Cleaned up intermediate files in gCNV WDL. (#5382)

59bfa51

samuelklee added a commit that referenced this issue Feb 21, 2019

Cleaned up intermediate files in gCNV WDL. (#5382)

bdfce9c

samuelklee mentioned this issue Feb 21, 2019

Added some fixes for minor CNV issues. #5699

Merged

samuelklee added a commit that referenced this issue Feb 21, 2019

Cleaned up intermediate files in gCNV WDL. (#5382)

c66bbbd

samuelklee added a commit that referenced this issue Feb 27, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

b61ee12

…os. (#5382)

samuelklee added a commit that referenced this issue Feb 27, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

f10d081

…os. (#5382)

samuelklee added a commit that referenced this issue Feb 27, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

ffd7ca8

…os. (#5382)

samuelklee added a commit that referenced this issue Feb 27, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

b7f9b04

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 2, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

71bfff5

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 4, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

c65b2c2

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 4, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

f7f890d

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 7, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

125cb39

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 8, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

e72424d

…os. (#5382)

samuelklee added a commit that referenced this issue Mar 9, 2019

Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typ…

0ff34ac

…os. (#5382)

samuelklee closed this as completed in #5699 Mar 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gCNV WDLs should clean up intermediate files and directories. #5382

gCNV WDLs should clean up intermediate files and directories. #5382

vruano commented Oct 31, 2018

samuelklee commented Nov 1, 2018 •

edited

Loading

samuelklee commented Nov 7, 2018

samuelklee commented Jan 31, 2019

samuelklee commented Feb 15, 2019 •

edited

Loading

gCNV WDLs should clean up intermediate files and directories. #5382

gCNV WDLs should clean up intermediate files and directories. #5382

Comments

vruano commented Oct 31, 2018

samuelklee commented Nov 1, 2018 • edited Loading

samuelklee commented Nov 7, 2018

samuelklee commented Jan 31, 2019

samuelklee commented Feb 15, 2019 • edited Loading

samuelklee commented Nov 1, 2018 •

edited

Loading

samuelklee commented Feb 15, 2019 •

edited

Loading