Improved memory usage in gCNV. #5781

samuelklee · 2019-03-11T14:21:04Z

-Changed sampling of denoised copy ratios to address memory spike and updated output formats and filenames. Partially addresses #5754.
-Updated theano version to 1.0.4 and changed numpy install source to conda defaults to enable MKL.
-Updated theano flags to use MKL and OpenMP elemwise.

Closes #5764.

samuelklee · 2019-03-11T14:50:15Z

Prior to this, gCNV shard size and VM type have never been optimized. For WES cohort mode, we arbitrarily ran with:

37 shards of 200 samples by 5000 intervals on n1-standard-8s (8 CPU, 30GB memory, $0.08 / hr) each taking ~2 hours = ~3 cents / sample

This PR gets rid of a memory spike in the sampling of denoised copy ratios, fixes a memory leak by updating theano, and also adds some theano flags that typically yield a factor of ~2 speedup (notably, the OpenMP elemwise flag, although we also get a slight boost from using numpy MKL). This allows us to run, e.g.:

2 shards of 50 samples by 100000 intervals on n1-standard-8s (8 CPU, 30GB memory, $0.08 / hr) each taking ~5 hours = ~1.6 cents / sample
4 shards of 50 samples by 50000 intervals on n1-highmem-4s (4 CPU, 26GB memory, $0.05 / hr) each taking ~3.25 hours = ~1.3 cents / sample
45 shards of 50 samples by 5000 intervals on n1-standard-1s (1CPU, 3.75GB memory, $0.01 / hr) each taking ~0.5 hours = ~0.5 cents / sample

For these runs, we used a slightly larger interval list and 1/4 the number of samples than in the first example, but because everything scales linearly, it's probably fair to compare the per-sample-and-interval costs. So we get a factor of ~8 savings if we keep the shard size the same.

The cost was already satisfactory, but fixing the leak allows us to more easily run scatters that are not so wide, which may be crucial for running the megaWDL. Adding the OpenMP flag also lets CPU scalability work as intended.

We can do a more systematic optimization for cost if desired, and we should also revalidate to make sure performance doesn't vary too much with shard size (from spot checking, it looks like marginal and/or single-bin calls may flicker on and off).

Note that we have still not optimized inference for WES, although I believe @vruano has done some optimizations for WGS. @mwalker174 @vruano for WGS with 2kb bins, I would expect the cost of the gCNV step to be ~10 cents in cohort mode before inference optimizations, assuming we address #5716 to minimize disk costs.

@asmirnov239 can you review? And maybe you can address dCR output in PostprocessGermlineCNVCalls and expose the number of samples in a separate PR? We can make some further changes to the dCR format there if we need.

samuelklee · 2019-03-11T14:55:22Z

@lucidtronix @cmnbroad @jamesemery also note that I went ahead and switched over to numpy MKL, I'm assuming you have no objections!

codecov-io · 2019-03-11T15:00:12Z

Codecov Report

Merging #5781 into master will decrease coverage by 0.001%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5781       +/-   ##
===============================================
- Coverage     86.999%   86.998%   -0.001%     
  Complexity     32110     32110               
===============================================
  Files           1974      1974               
  Lines         147249    147249               
  Branches       16218     16218               
===============================================
- Hits          128105    128104        -1     
  Misses         13236     13236               
- Partials        5908      5909        +1

Impacted Files	Coverage Δ	Complexity Δ
...nder/utils/runtime/StreamingProcessController.java	`67.299% <0%> (-0.474%)`	`33% <0%> (ø)`

asmirnov239

@samuelklee Back to you!

asmirnov239 · 2019-03-15T18:22:11Z

src/main/python/org/broadinstitute/hellbender/gcnvkernel/models/commons.py

        num_samples: number of samples to draw

    Returns:
        A generator that will yield `num_samples` samples from an approximation to a posterior
    """
-    return (model_approx.sample()[model_var_name] for _ in range(num_samples))
+
+    sample = stochastic_node_mean_symbolic(model_approx, node, size=1)


Can you try using approximation.sample_node method instead?

Reminder to self to recheck for memory spike as we discussed.

samuelklee · 2019-03-16T02:40:45Z

No spike and the results are exactly the same when using sample_node. Sorry for not realizing that the other method cribbed from this earlier.

Interestingly, I also checked whether compilation affects memory usage---it doesn't. Compilation can take a non-trivial amount of time for short-running shards, so we might experiment with distributing a precompiled model for fixed shard sizes in production. Not sure if this works in practice or what the effect of getting VMs on different architectures might be, though.

…updated output formats and filenames.

…onda defaults to enable MKL.

samuelklee · 2019-03-18T14:25:24Z

Thanks @asmirnov239, back to you!

asmirnov239 · 2019-03-18T16:17:33Z

Looks good @samuelklee! Thanks for fixing it

samuelklee requested a review from asmirnov239 March 11, 2019 14:54

samuelklee mentioned this pull request Mar 11, 2019

Expose number of samples for emitting denoised copy ratios in gCNV. #5754

Closed

cmnbroad mentioned this pull request Mar 13, 2019

Updates to the dockerfile to use latest intel-optimized tensorflow #5725

Merged

droazen assigned asmirnov239 Mar 13, 2019

asmirnov239 requested changes Mar 15, 2019

View reviewed changes

samuelklee added 4 commits March 15, 2019 22:43

Changed sampling of denoised copy ratios to address memory spike and …

770e675

…updated output formats and filenames.

Updated theano version to 1.0.4 and changed numpy install source to c…

5823f50

…onda defaults to enable MKL.

Updated theano flags to use MKL and OpenMP elemwise.

96b2f8c

Addressed PR comments.

f772b3b

samuelklee force-pushed the sl_gcnv_mem_fixes branch from db9bad5 to f772b3b Compare March 16, 2019 02:43

asmirnov239 approved these changes Mar 18, 2019

View reviewed changes

samuelklee merged commit 0429d5a into master Mar 18, 2019

samuelklee deleted the sl_gcnv_mem_fixes branch March 18, 2019 16:48

samuelklee mentioned this pull request Mar 16, 2020

Upgrade numpy to version 1.17.0 #6494

Closed

samuelklee mentioned this pull request Jun 16, 2020

Add alternate/optional sample-index argument for PostProcessGermlineCNVCalls #6659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved memory usage in gCNV. #5781

Improved memory usage in gCNV. #5781

samuelklee commented Mar 11, 2019

samuelklee commented Mar 11, 2019 •

edited

Loading

samuelklee commented Mar 11, 2019

codecov-io commented Mar 11, 2019 •

edited

Loading

asmirnov239 left a comment

asmirnov239 Mar 15, 2019

samuelklee Mar 15, 2019

samuelklee commented Mar 16, 2019

samuelklee commented Mar 18, 2019

asmirnov239 commented Mar 18, 2019

Improved memory usage in gCNV. #5781

Improved memory usage in gCNV. #5781

Conversation

samuelklee commented Mar 11, 2019

samuelklee commented Mar 11, 2019 • edited Loading

samuelklee commented Mar 11, 2019

codecov-io commented Mar 11, 2019 • edited Loading

Codecov Report

asmirnov239 left a comment

Choose a reason for hiding this comment

asmirnov239 Mar 15, 2019

Choose a reason for hiding this comment

samuelklee Mar 15, 2019

Choose a reason for hiding this comment

samuelklee commented Mar 16, 2019

samuelklee commented Mar 18, 2019

asmirnov239 commented Mar 18, 2019

samuelklee commented Mar 11, 2019 •

edited

Loading

codecov-io commented Mar 11, 2019 •

edited

Loading