Standardized use of mem parameters in CNV WDLs. #4193

samuelklee · 2018-01-17T21:29:29Z

We use the machine_mem/command_mem framework for most tasks; others are unlikely to need any special memory considerations.

There were some tasks for which machine_mem and command_mem seemed to be switched in the original WDL. @jsotobroad was there any reason for this? I'm guessing they were just typos. I'm also not sure that some of the tasks would've actually run even if they were switched, since they would've resulted in non-integer -Xmx arguments. I changed everything over to MB to avoid this.

Closes #4092.

samuelklee · 2018-01-17T21:36:22Z

@davidbenjamin mind reviewing? @LeeTL1220 take note for the style guide, if necessary.

codecov-io · 2018-01-18T14:36:02Z

Codecov Report

Merging #4193 into master will increase coverage by 0.013%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #4193       +/-   ##
===============================================
+ Coverage     78.475%   78.489%   +0.013%     
  Complexity     16645     16645               
===============================================
  Files           1061      1061               
  Lines          59866     59866               
  Branches        9756      9756               
===============================================
+ Hits           46980     46988        +8     
+ Misses          9103      9097        -6     
+ Partials        3783      3781        -2

Impacted Files	Coverage Δ	Complexity Δ
...park/sv/discovery/alignment/AlignmentInterval.java	`90.038% <0%> (+0.383%)`	`74% <0%> (ø)`	⬇️
...nder/utils/runtime/StreamingProcessController.java	`71.193% <0%> (+0.823%)`	`50% <0%> (ø)`	⬇️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java	`80% <0%> (+1.29%)`	`39% <0%> (ø)`	⬇️
...e/hellbender/engine/spark/SparkContextFactory.java	`73.973% <0%> (+2.74%)`	`11% <0%> (ø)`	⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java	`90% <0%> (+10%)`	`3% <0%> (ø)`	⬇️

jsotobroad

Yay standardization

jsotobroad · 2018-01-18T14:32:34Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -92,6 +92,9 @@ task CollectCounts {
    Int? preemptible_attempts
    Int? disk_space_gb

+    Int machine_mem = select_first([mem, 8]) * 1000


As far as small optimizations go, if these tasks that are asking for 8000 MB by default can make due with 7500 MB then you can save 25% on GCP compute - https://cloud.google.com/compute/pricing#machinetype - n1-standard-2 vsn1-highmem-2

OK, good to know! No particular reason these need Xmx8G, so I'll change them.

jsotobroad · 2018-01-18T14:35:06Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -137,8 +140,7 @@ task CollectAllelicCounts {
    Int? preemptible_attempts
    Int? disk_space_gb

-    # Mem is in units of GB but our command and memory runtime values are in MB


Somewhere it should made clear that the user provided mem input should be defined in units of GB

Any objection to simply using *_mem_gb and *_mem_mb everywhere? I'd rather not duplicate this comment everywhere.
@LeeTL1220 @davidbenjamin?

@samuelklee Fine by me...

Any objection to simply using *_mem_gb and *_mem_mb everywhere?

That is sufficiently self-documenting for my tastes.

jsotobroad · 2018-01-18T14:50:42Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -110,7 +113,7 @@ task CollectCounts {

    runtime {
        docker: "${gatk_docker}"


As per @LeeTL1220's template, we should be putting in values for cpu even in the single threaded case.

I didn't check to make sure all of the tasks adhered to the template just yet, since I'm assuming that it might still change. (Actually, I don't think any of the somatic CNV tasks specify cpu.) I'll focus on the mem changes in this PR and overhaul the rest later (perhaps once tasks are automatically generated), if you don't mind!

@samuelklee Fine by me.

samuelklee · 2018-01-18T15:23:02Z

Will merge when tests pass, unless there are further objections.

LeeTL1220 · 2018-01-18T15:30:00Z

@samuelklee From my perspective, merge away. Assuming that you are doing the cpu change later.

davidbenjamin

Consistency is the hobgoblin of my review.

davidbenjamin · 2018-01-18T15:32:12Z

scripts/cnv_wdl/cnv_common_tasks.wdl

    String gatk_docker
    Int? preemptible_attempts
    Int? disk_space_gb

+    Int machine_mem_mb = select_first([mem_gb  * 1000, 7500])
+    Int command_mem_mb = machine_mem_mb - 1000


Why don't you do this in the tasks above?

Thanks for catching this. Actually, I think this makes the tests fail---you can't perform the multiply operation inside select_first, apparently.

davidbenjamin · 2018-01-18T15:33:06Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -69,7 +69,7 @@ task AnnotateIntervals {

    runtime {
        docker: "${gatk_docker}"
-        memory: select_first([mem, 5]) + " GB"
+        memory: select_first([mem_gb, 5]) + " GB"


This is strange because if mem_gb is supplied the machine memory and command memory are the same, but if it's not supplied they use different defaults.

Oops, good catch. I think for some of these more minor tasks we didn't do the whole machine_mem/command_mem thing, as noted above. I'll just go back and standardize everything.

davidbenjamin · 2018-01-18T15:33:53Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -21,7 +21,7 @@ task PreprocessIntervals {
        set -e
        export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk4_jar_override}

-        gatk --java-options "-Xmx${default="2" mem}g" PreprocessIntervals \
+        gatk --java-options "-Xmx${default="2" mem_gb}g" PreprocessIntervals \


I don't feel great about using Xmx_g in some places and Xmx_m in others.

davidbenjamin · 2018-01-18T15:35:07Z

scripts/cnv_wdl/cnv_common_tasks.wdl

@@ -137,8 +140,7 @@ task CollectAllelicCounts {
    Int? preemptible_attempts
    Int? disk_space_gb

-    # Mem is in units of GB but our command and memory runtime values are in MB


Any objection to simply using *_mem_gb and *_mem_mb everywhere?

That is sufficiently self-documenting for my tastes.

davidbenjamin · 2018-01-18T15:36:38Z

scripts/cnv_wdl/germline/cnv_germline_cohort_workflow.wdl

-    Int machine_mem = if defined(mem) then select_first([mem]) else 8
-    Float command_mem = machine_mem - 0.5
+    Int machine_mem_mb = select_first([mem_gb, 7]) * 1000
+    Int command_mem_mb = machine_mem_mb - 500


Elsewhere you have a 1GB difference.

davidbenjamin · 2018-01-18T15:39:09Z

scripts/cnv_wdl/somatic/cnv_somatic_pair_workflow.wdl

    # ModelSegments seems to need at least 3GB of overhead to run
-    Int command_mem = machine_mem - 3000
+    Int command_mem_mb = machine_mem_mb - 3000


Does this make sense to you? I mean, shouldn't the difference between command and machine memory just be the OS, hence independent of the GATK command?

Not sure, this was from @jsotobroad. I'd be fine with standardizing the difference everywhere if that works.

@jsotobroad can answer this, @davidbenjamin .

samuelklee · 2018-01-18T18:26:41Z

Responded to @davidbenjamin. Still have differences for machine_mem_mb - command_mem_mb = 500, 1000, and 3000---@jsotobroad any reason for this? Can we make everything the same?

LeeTL1220 · 2018-01-18T19:48:52Z

I'm good with this if @jsotobroad is good too.

davidbenjamin · 2018-01-18T20:01:34Z

Me too.

samuelklee · 2018-01-22T16:10:04Z

@jsotobroad I'm going to go ahead and merge this and file an issue for the difference in memory overhead.

`XsvTableFeature` no longer removes an extra column if start and end in the config file for a `LocatableXsv` data source are the same.

…ons (#4915) `XsvTableFeature` no longer removes an extra column if start and end in the config file for a `LocatableXsv` data source are the same. Fixes #4193

samuelklee requested a review from davidbenjamin January 17, 2018 21:30

samuelklee assigned davidbenjamin Jan 17, 2018

samuelklee force-pushed the sl_standardize_mem_wdl branch from 785f93d to f9b2701 Compare January 17, 2018 23:23

jsotobroad reviewed Jan 18, 2018

View reviewed changes

samuelklee added 2 commits January 18, 2018 10:21

Standardized use of mem parameters in CNV WDLs.

c107aea

Addressed PR comments.

0bb9c95

samuelklee force-pushed the sl_standardize_mem_wdl branch from f9b2701 to 0bb9c95 Compare January 18, 2018 15:21

davidbenjamin requested changes Jan 18, 2018

View reviewed changes

More PR comments.

ffd6298

davidbenjamin approved these changes Jan 18, 2018

View reviewed changes

davidbenjamin removed their assignment Jan 18, 2018

samuelklee assigned jsotobroad Jan 19, 2018

samuelklee merged commit 32f25b9 into master Jan 22, 2018

samuelklee deleted the sl_standardize_mem_wdl branch January 22, 2018 16:10

jonn-smith added a commit that referenced this pull request Jun 19, 2018

Fixes #4193

5551ab2

`XsvTableFeature` no longer removes an extra column if start and end in the config file for a `LocatableXsv` data source are the same.

jonn-smith added a commit that referenced this pull request Jun 19, 2018

Fixes #4193

ab8c927

`XsvTableFeature` no longer removes an extra column if start and end in the config file for a `LocatableXsv` data source are the same.

jonn-smith mentioned this pull request Jun 19, 2018

XsvTableFeatures now always put out the right number of columns. #4915

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardized use of mem parameters in CNV WDLs. #4193

Standardized use of mem parameters in CNV WDLs. #4193

samuelklee commented Jan 17, 2018 •

edited

Loading

samuelklee commented Jan 17, 2018

codecov-io commented Jan 18, 2018 •

edited

Loading

jsotobroad left a comment

jsotobroad Jan 18, 2018

samuelklee Jan 18, 2018

jsotobroad Jan 18, 2018

samuelklee Jan 18, 2018

LeeTL1220 Jan 18, 2018

samuelklee Jan 18, 2018

davidbenjamin Jan 18, 2018

jsotobroad Jan 18, 2018

samuelklee Jan 18, 2018

LeeTL1220 Jan 18, 2018

samuelklee commented Jan 18, 2018

LeeTL1220 commented Jan 18, 2018

davidbenjamin left a comment

davidbenjamin Jan 18, 2018

samuelklee Jan 18, 2018

davidbenjamin Jan 18, 2018

samuelklee Jan 18, 2018 •

edited

Loading

davidbenjamin Jan 18, 2018

davidbenjamin Jan 18, 2018

davidbenjamin Jan 18, 2018

davidbenjamin Jan 18, 2018

samuelklee Jan 18, 2018

LeeTL1220 Jan 18, 2018

samuelklee commented Jan 18, 2018

LeeTL1220 commented Jan 18, 2018

davidbenjamin commented Jan 18, 2018

samuelklee commented Jan 22, 2018

		@@ -110,7 +113,7 @@ task CollectCounts {

		runtime {
		docker: "${gatk_docker}"

Standardized use of mem parameters in CNV WDLs. #4193

Standardized use of mem parameters in CNV WDLs. #4193

Conversation

samuelklee commented Jan 17, 2018 • edited Loading

samuelklee commented Jan 17, 2018

codecov-io commented Jan 18, 2018 • edited Loading

Codecov Report

jsotobroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelklee commented Jan 18, 2018

LeeTL1220 commented Jan 18, 2018

davidbenjamin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelklee Jan 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelklee commented Jan 18, 2018

LeeTL1220 commented Jan 18, 2018

davidbenjamin commented Jan 18, 2018

samuelklee commented Jan 22, 2018

samuelklee commented Jan 17, 2018 •

edited

Loading

codecov-io commented Jan 18, 2018 •

edited

Loading

samuelklee Jan 18, 2018 •

edited

Loading