Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DifferentialEquation Op refactor #3634

Merged
merged 14 commits into from
Nov 3, 2019

Conversation

michaelosthege
Copy link
Member

@michaelosthege michaelosthege commented Sep 25, 2019

Why

The current implementation of the DifferentialEquation Op currently involves a few "hacks":

  • a dictionary-based cache of simulation results
  • return value is a 1D tensor which the user must reshape
  • no support for compute_test_value

Changes (WIP)

  • strongly typed inputs/outputs
  • support for test values
  • optionally return sensitivities to the user
  • 2D return shapes of the Op
  • example notebook with a model having two state variables; different observations

The notebook ODE_API_WIP.ipynb implements a simple enzymatic reaction (S->P) model with sparse observations.
It also has a benchmark cell that we can use for future ODE performance testing.
Finally, in the computation graph, one can see (after zooming in a lot) that the modified implementation does not use a helper Op for the gradient, but directly uses the symbolic output from the forward pass.

In a previous commit on this branch, the new refactors were benchmarked against the existing implementation. There was no significant performance difference.

ToDo

  • notebooks were renamed
  • re-run notebooks on a fast machine
  • align existing tests
  • do we need more tests in this PR?
  • remove older implementation
  • align docstrings & code style (please review!)
  • updated changelog

+ full support for test_values
+ explicit input/output types
+ 2D return shape
+ optional return of sensitivities
+ gradient without helper Op
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@michaelosthege michaelosthege marked this pull request as ready for review November 1, 2019 13:00
@codecov
Copy link

codecov bot commented Nov 1, 2019

Codecov Report

Merging #3634 into master will decrease coverage by <.01%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3634      +/-   ##
==========================================
- Coverage   89.79%   89.78%   -0.01%     
==========================================
  Files         134      134              
  Lines       20050    20088      +38     
==========================================
+ Hits        18004    18037      +33     
- Misses       2046     2051       +5
Impacted Files Coverage Δ
pymc3/exceptions.py 100% <100%> (ø) ⬆️
pymc3/ode/utils.py 100% <100%> (ø) ⬆️
pymc3/tests/test_ode.py 100% <100%> (ø) ⬆️
pymc3/tests/test_util.py 100% <100%> (ø) ⬆️
pymc3/ode/ode.py 93.75% <91.54%> (-3.66%) ⬇️
pymc3/distributions/distribution.py 95.02% <0%> (-0.3%) ⬇️

@michaelosthege michaelosthege changed the title [WIP] DifferentialEquation Op improvements DifferentialEquation Op refactor Nov 1, 2019
Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me!

I did not check for "scientific correctness", but it looks like it follows the old code pretty closely.

pymc3/ode/ode.py Outdated Show resolved Hide resolved
@ColCarroll
Copy link
Member

I will try to also look at the notebooks and make sure they run on my machine later today, but I am ok merging before that happens (or if I do not get to it).

@michaelosthege michaelosthege added ready? and removed WIP labels Nov 2, 2019
@Dpananos
Copy link
Contributor

Dpananos commented Nov 2, 2019

@michaelosthege This looks great! Thanks for all the effort

@ColCarroll
Copy link
Member

Merging! Good work on this!

@ColCarroll ColCarroll merged commit 225ae82 into pymc-devs:master Nov 3, 2019
@fonnesbeck
Copy link
Member

The sampling for the ODE models in the notebooks all take 4+ hours to run, even on my souped-up MBP. Is this expected (and why they are unrendered)?

@Dpananos
Copy link
Contributor

Dpananos commented Nov 4, 2019

@fonnesbeck Both notebooks? Even the original one which doesn't use DifferentialEquation?

@fonnesbeck
Copy link
Member

fonnesbeck commented Nov 4, 2019

I'm looking specifically at the one with the SIR and falling object examples.

@Dpananos
Copy link
Contributor

Dpananos commented Nov 4, 2019

Hmm. Can confirm. Models say sampling will take approx 2.5 hours. The first example in that notebook shouldn't take longer than 10 minutes from what I remember.

@michaelosthege
Copy link
Member Author

I can confirm as well. With NUTS I get ~5s/it (1 chain 1 core), whereas DEMetropolis runs at 1.5it/s (12 chains 1 core), so NUTS is about 40x slower.

From the relative speeds that sounds realistic, because NUTS evaluates the model several times per iteration.

The absolute speed however is quite slow indeed.

I will try to check out the previous version and run the same comparison.

@michaelosthege
Copy link
Member Author

michaelosthege commented Nov 4, 2019

So here the puzzle:

  • The original implementation (by Demetri) runs NUTS 20x faster than my refactor of it
  • DEMetropolis runs with the same speed on both versions
  • My benchmark (notebook) gives approximately the same speed for both versions

Here are two hypotheses:

  1. something with the caching gives NUTS an advantage?
  2. If I remember correctly, my refactor moves the vector-sensitivities product from numpy to theano. Could this explain why only NUTS has a performance difference?

For reference, the approximate seconds/it/chain

original refactor
NUTS 0.1 5
DEMetropolis 0.05 0.05

Maybe my benchmark did not capture the relevant parts of the graph?

def make_benchmark():
    pmodel = get_model()
    
    # benchmark using the full gradient
    test_t = tt.grad(pmodel.logpt, [pmodel.sigma, pmodel.vmax, pmodel.K_S, pmodel.red_0])
    # compile a function to evaluate the gradient
    test_f = theano.function(inputs=pmodel.cont_vars, outputs=test_t)
    
    # test the compiled function with the true parameters
    print(f'Test gradient:')
    print(test_f(0.2, 0.5, 2, 10))
    
    # make a benchmarking function that uses random inputs
    # - to avoid cheating by caching
    # - to get a more realistic distribution over simulation times
    def bm():
        test_f(
            np.log(np.random.uniform(0.1, 0.2)),
            np.log(np.random.uniform(0.4, 0.6)),
            np.log(np.random.uniform(1.9, 2.1)),
            np.random.uniform(9, 11)
        )
    return pmodel, bm

@michaelosthege
Copy link
Member Author

I found two things that look suspicious:

1.: Benchmark gradient is not representative

The benchmark (ODE_API_shapes_and_benchmarking notebook) does a different gradient than the HMC implementation:

# benchmark using the full gradient
test_t = tt.grad(pmodel.logpt, [pmodel.sigma, pmodel.vmax, pmodel.K_S, pmodel.red_0])
# compile a function to evaluate the gradient
test_f = theano.function(inputs=pmodel.cont_vars, outputs=test_t)

compared to model.py:734 and model.py:467

2.: Symbolic sensitivity handover does not work when using NUTS

If you take ODE_API_shapes_and_benchmarking.ipynb and append the following two cells at the bottom, you can get the debug output of the graph construction:

with get_model() as model:
    pm.find_MAP()
# Output:
#Applied log-transform to sigma and added transformed sigma_log__ to model.
#Applied log-transform to vmax and added transformed vmax_log__ to model.
#Applied log-transform to K_S and added transformed K_S_log__ to model.
#make_node for inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#logp = -37.589, ||grad|| = 21.255: 100%|███████████████████| 27/27 [00:01<00:00, 15.57it/s]

The output above shows that for find_MAP, the grad method of the Op is called 3 times with the same inputs that were previously fed to make_node (same hash). This way the cached sensitivity Tensor is used, avoiding duplicate computation.

The following output shows that the logp_dlogp_function function, which (I think @ColCarroll) is also used by HMC calls grad with different inputs, rendering the sensitivity-caching ineffective.

with model:
    f = model.logp_dlogp_function(model.cont_vars)
# Output:
#grad w.r.t. inputs 5369686199239769812
#[MakeVector{dtype='float64'}.0, MakeVector{dtype='float64'}.0]      <---- _log.debug(inputs) in ode.py:197
#No cached sensitivities found!
#make_node for inputs 5369686199239769812

Finally, running NUTS gives a similar output to the one above:

with model:
    trace = pm.sample(
        draws=1000, tune=500,
        chains=1, cores=1,
        step=pm.NUTS()
    )
# Output:
#grad w.r.t. inputs -1522734632884762703
#[MakeVector{dtype='float64'}.0, MakeVector{dtype='float64'}.0]
#No cached sensitivities found!
#make_node for inputs -1522734632884762703
#Sequential sampling (1 chains in 1 job)
#NUTS: [red_0, K_S, vmax, sigma]
#Sampling chain 0, 0 divergences: .......

Conclusion & Actions

The failure to cache symbolic sensitivities could explain a 2x slowdown.

Maybe, the internalization of the sensitivity-vector product from numpy to theano could explain the rest?

@ColCarroll @aseyboldt do you have an idea how I could repair the symbolic sensitivity cache?

If the sensitivity cache trick does not work with HMC gradients or the product in theano is significantly slower, I could bring back the ODEGradOp.

For re-use of DifferentialEquation Op instances, either self._output_sensitivities Tensors, or self._grad_ops must be cached in dicts using something like hash(inputs) keys.

Should this be impossible, Op instance re-use will cause a 2x slowdown.

@michaelosthege
Copy link
Member Author

I'm continuing the discussion over in #3676

@michaelosthege michaelosthege deleted the ode-shapes-and-caching branch August 7, 2021 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants