Sl/numba tauchen #227

sglyon · 2016-01-31T04:23:33Z

I needed a faster version of tauchen, so I dropped the loops into numba.

Here are the before timings (on the actual size problem I'm working on in the application):

In [21]: timeit tauchen(0.95, 0.1, 3, 451)
1 loops, best of 3: 32.4 s per loop

After timings:

In [4]: timeit qe.tauchen(0.95, 0.1, 3, 451)
10 loops, best of 3: 65.4 ms per loop

Not quite as quick as the Julia, but much closer (julia timings below).

julia> @benchmark tauchen(451, 0.95, 0.1, 0.0, 3)
================ Benchmark Results ========================
     Time per evaluation: 28.28 ms [27.11 ms, 29.45 ms]
Proportion of time in GC: 0.51% [0.00%, 1.71%]
        Memory allocated: 1.55 mb
   Number of allocations: 3 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 3.04 s

oyamad · 2016-01-31T14:42:49Z

Speedup very impressive.

I have a not-so-important comment. In the loop, x[j] - rho * x[i] +/- half_step is computed many times, but isn't it enough to compute once and increment by step each time?

Something like:

def _fill_tauchen(x, P, n, rho, sigma, step):
    for i in range(n):
        w = x[0] - rho * x[i] + step / 2
        P[i, 0] = std_norm_cdf(w / sigma)

        for j in range(1, n - 1):
            P[i, j] = -std_norm_cdf(w / sigma)
            w += step
            P[i, j] += std_norm_cdf(w / sigma)

        P[i, n-1] = 1 - std_norm_cdf(w / sigma)

or

def _fill_tauchen(x, P, n, rho, sigma, step):
    step_sigma = step / sigma
    for i in range(n):
        w = (x[0] - rho * x[i] + step / 2) / sigma
        P[i, 0] = std_norm_cdf(w)

        for j in range(1, n - 1):
            P[i, j] = -std_norm_cdf(w)
            w += step_sigma
            P[i, j] += std_norm_cdf(w)

        P[i, n-1] = 1 - std_norm_cdf(w)

(Speedup may be negligible though.)

albop · 2016-01-31T22:58:57Z

The performance gains seem indeed quite good. There is probably no point in looking for further improvements, since it is unlikely that tauchen will be repeatedly called (ou can always scale the normal).

I still did a bit of digging. Most performance costs seem to come from the erfc function (I replaced it by identity to check). Maybe the difference in the implementations of this routine explain the difference. In numba the implementation of erfc is the one from python c (https://github.com/numba/numba/blob/81c83aaacf1b8d871d4e31719efd670bfee50c9c/numba/_helperlib.c). The one used in julia is apparently the same as in the glibc.
llvm 3.7 to which numba is currently switching has incorporated erfc so the situation may be easy to improve on the numba side.

cc7768 · 2016-02-01T02:48:42Z

I agree with @albop that it is probably futile to look for more performance gains, but fwiw this was faster on my computer (but only marginally and we lose the benefit of the function being jitted).

import numpy as np
import scipy.stats as st


def tauchen(n, m, ybar, sigma, rho=0.0):
    """
    Takes as inputs n, m, ybar, sigma_e and produces a markov chain
    approximation to the process :
    y_t = \bar{y} + \rho y_{t-1} + \varepsilon_t
    where \varepsilon_t is i.i.d. normal of mean 0, std dev of sigma

    Parameters
    ----------
    n : int
        The number of points to approximate the distribution

    m : int
        The number of standard deviations to go away from the mean
        in each direction

    ybar : float
        The value \bar{y} in the process.  Note that the mean of this
        AR(1) process, y, is simply ybar/(1 - rho)

    sigma : float
        The value of the standard deviation of the \varepsilon process

    rho : float
        By default this will be 0, but if you are approximating an AR(1)
        process then this is the autocorrelation across periods

    Returns
    -------
    bar : 1d array of floats
        The possible states for the markov chain

    pi : 2d array of floats
        The transition probabilities across states

    """

    # Get the standard deviation of y
    y_sd = np.sqrt(sigma**2/(1 - rho**2))


    # Take go out m standard deviations and create equally spaced points
    # from -m*y_sd to m*y_sd, note d is distance between points
    ubar = m*y_sd
    lbar = -ubar
    bar, d = np.linspace(lbar, ubar, n, retstep=True)

    # Create matrix of values
    barmat = np.repeat(bar, n).reshape(n, n)

    barup = (barmat.T + d/2. - rho*barmat)/sigma
    bardo = (barmat.T - d/2. - rho*barmat)/sigma

    pi = st.norm.cdf(barup) - st.norm.cdf(bardo)
    pi[:, 0] = st.norm.cdf(barup[:, 0])
    pi[:, -1] = 1. - st.norm.cdf(bardo[:, -1])

    bar += ybar/(1. - rho)

    return bar, pi

sglyon · 2016-02-01T14:57:12Z

The erfc function must be quite a bit slower in python than whatever Julia is using because the algorithm is exactly the same in both languages: fill in rows at a time. This will not be consistent with Julia's column major arrays so we are likely getting a performance hit by not "transposing" the algorithm in Julia.

Anyway, I would be happy merging either this or Chase's function. I'm indifferent between the two.

Also agree with @albop that searching for milliseconds here probably doesn't need to be a priority. I just didn't like waiting 35+ seconds to initialize a model when I knew it could take under a second ;)

cc7768 · 2016-02-01T15:38:57Z

I have slight preference for the numbaized function. Cleaner and the difference is pretty negligible (and sounds like it will get better for free).

albop · 2016-02-01T15:49:26Z

Yes, @cc7768, it will get better over time.
Just for fun: one can use the libm function directly using:

from cffi import FFI
ffi = FFI()
ffi.cdef('double erfc(double x);')
libm = ffi.dlopen("m")
erfc = libm.erfc

On my machine, this test produces the following timing:

cpython: 2.5553548336029053
libm: 0.09564900398254395

It is a bit more than 20 times faster than the cpython version !

albop · 2016-02-01T16:00:11Z

With the libm version of erfc the numbaized version goes from 88ms to 25ms on my computer.

Just to be clear: my vote goes to keep the numbaized version exactly as it is and not bother with including more optimization.
This is essentially a numba "bug". There has been an ongoing disussion about the relative performances of math. and numpy. functions here numba/numba#1616 (see also: https://williamjshipman.wordpress.com/2016/01/12/whats-faster-in-numba-jit-functions-numpy-or-the-math-package/). We should just make sure that they are aware of potential differences with libm.

sglyon · 2016-02-01T17:31:31Z

Thanks for checking out those benchmarks.

I think that as long as @jstac is ok with this, we can merge as is

jstac · 2016-02-01T20:37:00Z

Thanks all for the discussion. I've been following and I've learned from reading it. It makes sense to Numba-fy the routine and I agree that we don't need to search for the last drop of speed here.

As a general comment for this PR and others, you guys are in charge of QE.py and QE.jl so please feel free to merge without my input. They are really your libraries now, and you are better equipped to take the lead than I am.

If it's a change that affects the lectures, I'd appreciate if you let me know (although it's not actually your responsibility, and in any case I follow the discussions in the PRs). It will be the job of Tom and I to keep the lectures up to date with changes that you guys make to the libraries.

For QuantEcon.applications, Tom and I still need to take charge of changes there, since that repo just serves the lectures at this stage.

Sl/numba tauchen

sglyon · 2016-02-02T00:54:30Z

Sounds great, thanks @jstac!

mmcky · 2016-02-02T02:53:42Z

@spencerlyon2 thanks for this update. Do you mind if we delete the branch?

sglyon · 2016-02-02T03:33:35Z

Not at all. Let's keep things tidy

// Spencer
On Feb 1, 2016 21:53, "mmcky" [email protected] wrote:

@spencerlyon2 https://github.com/spencerlyon2 thanks for this update.
Do you mind if we delete the branch?

—
Reply to this email directly or view it on GitHub
#227 (comment)
.

sglyon added 2 commits January 30, 2016 23:20

numbaize tauchen

abd11b8

numbaize tauchen

5e9a6af

sglyon force-pushed the sl/numba-tauchen branch from efbe114 to 5e9a6af Compare January 31, 2016 04:29

sglyon added a commit that referenced this pull request Feb 2, 2016

Merge pull request #227 from QuantEcon/sl/numba-tauchen

9ff954c

Sl/numba tauchen

sglyon merged commit 9ff954c into master Feb 2, 2016

mmcky deleted the sl/numba-tauchen branch February 2, 2016 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sl/numba tauchen #227

Sl/numba tauchen #227

sglyon commented Jan 31, 2016

oyamad commented Jan 31, 2016

albop commented Jan 31, 2016

cc7768 commented Feb 1, 2016

sglyon commented Feb 1, 2016

cc7768 commented Feb 1, 2016

albop commented Feb 1, 2016

albop commented Feb 1, 2016

sglyon commented Feb 1, 2016

jstac commented Feb 1, 2016

sglyon commented Feb 2, 2016

mmcky commented Feb 2, 2016

sglyon commented Feb 2, 2016

Sl/numba tauchen #227

Sl/numba tauchen #227

Conversation

sglyon commented Jan 31, 2016

oyamad commented Jan 31, 2016

albop commented Jan 31, 2016

cc7768 commented Feb 1, 2016

sglyon commented Feb 1, 2016

cc7768 commented Feb 1, 2016

albop commented Feb 1, 2016

albop commented Feb 1, 2016

sglyon commented Feb 1, 2016

jstac commented Feb 1, 2016

sglyon commented Feb 2, 2016

mmcky commented Feb 2, 2016

sglyon commented Feb 2, 2016