Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sl/numba tauchen #227

Merged
merged 2 commits into from
Feb 2, 2016
Merged

Sl/numba tauchen #227

merged 2 commits into from
Feb 2, 2016

Conversation

sglyon
Copy link
Member

@sglyon sglyon commented Jan 31, 2016

I needed a faster version of tauchen, so I dropped the loops into numba.

Here are the before timings (on the actual size problem I'm working on in the application):

In [21]: timeit tauchen(0.95, 0.1, 3, 451)
1 loops, best of 3: 32.4 s per loop

After timings:

In [4]: timeit qe.tauchen(0.95, 0.1, 3, 451)
10 loops, best of 3: 65.4 ms per loop

Not quite as quick as the Julia, but much closer (julia timings below).

julia> @benchmark tauchen(451, 0.95, 0.1, 0.0, 3)
================ Benchmark Results ========================
     Time per evaluation: 28.28 ms [27.11 ms, 29.45 ms]
Proportion of time in GC: 0.51% [0.00%, 1.71%]
        Memory allocated: 1.55 mb
   Number of allocations: 3 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 3.04 s

@oyamad
Copy link
Member

oyamad commented Jan 31, 2016

Speedup very impressive.

I have a not-so-important comment. In the loop, x[j] - rho * x[i] +/- half_step is computed many times, but isn't it enough to compute once and increment by step each time?

Something like:

def _fill_tauchen(x, P, n, rho, sigma, step):
    for i in range(n):
        w = x[0] - rho * x[i] + step / 2
        P[i, 0] = std_norm_cdf(w / sigma)

        for j in range(1, n - 1):
            P[i, j] = -std_norm_cdf(w / sigma)
            w += step
            P[i, j] += std_norm_cdf(w / sigma)

        P[i, n-1] = 1 - std_norm_cdf(w / sigma)

or

def _fill_tauchen(x, P, n, rho, sigma, step):
    step_sigma = step / sigma
    for i in range(n):
        w = (x[0] - rho * x[i] + step / 2) / sigma
        P[i, 0] = std_norm_cdf(w)

        for j in range(1, n - 1):
            P[i, j] = -std_norm_cdf(w)
            w += step_sigma
            P[i, j] += std_norm_cdf(w)

        P[i, n-1] = 1 - std_norm_cdf(w)

(Speedup may be negligible though.)

@albop
Copy link
Contributor

albop commented Jan 31, 2016

The performance gains seem indeed quite good. There is probably no point in looking for further improvements, since it is unlikely that tauchen will be repeatedly called (ou can always scale the normal).

I still did a bit of digging. Most performance costs seem to come from the erfc function (I replaced it by identity to check). Maybe the difference in the implementations of this routine explain the difference. In numba the implementation of erfc is the one from python c (https://github.com/numba/numba/blob/81c83aaacf1b8d871d4e31719efd670bfee50c9c/numba/_helperlib.c). The one used in julia is apparently the same as in the glibc.
llvm 3.7 to which numba is currently switching has incorporated erfc so the situation may be easy to improve on the numba side.

@cc7768
Copy link
Member

cc7768 commented Feb 1, 2016

I agree with @albop that it is probably futile to look for more performance gains, but fwiw this was faster on my computer (but only marginally and we lose the benefit of the function being jitted).

import numpy as np
import scipy.stats as st


def tauchen(n, m, ybar, sigma, rho=0.0):
    """
    Takes as inputs n, m, ybar, sigma_e and produces a markov chain
    approximation to the process :
    y_t = \bar{y} + \rho y_{t-1} + \varepsilon_t
    where \varepsilon_t is i.i.d. normal of mean 0, std dev of sigma

    Parameters
    ----------
    n : int
        The number of points to approximate the distribution

    m : int
        The number of standard deviations to go away from the mean
        in each direction

    ybar : float
        The value \bar{y} in the process.  Note that the mean of this
        AR(1) process, y, is simply ybar/(1 - rho)

    sigma : float
        The value of the standard deviation of the \varepsilon process

    rho : float
        By default this will be 0, but if you are approximating an AR(1)
        process then this is the autocorrelation across periods

    Returns
    -------
    bar : 1d array of floats
        The possible states for the markov chain

    pi : 2d array of floats
        The transition probabilities across states

    """

    # Get the standard deviation of y
    y_sd = np.sqrt(sigma**2/(1 - rho**2))


    # Take go out m standard deviations and create equally spaced points
    # from -m*y_sd to m*y_sd, note d is distance between points
    ubar = m*y_sd
    lbar = -ubar
    bar, d = np.linspace(lbar, ubar, n, retstep=True)

    # Create matrix of values
    barmat = np.repeat(bar, n).reshape(n, n)

    barup = (barmat.T + d/2. - rho*barmat)/sigma
    bardo = (barmat.T - d/2. - rho*barmat)/sigma

    pi = st.norm.cdf(barup) - st.norm.cdf(bardo)
    pi[:, 0] = st.norm.cdf(barup[:, 0])
    pi[:, -1] = 1. - st.norm.cdf(bardo[:, -1])

    bar += ybar/(1. - rho)

    return bar, pi

@sglyon
Copy link
Member Author

sglyon commented Feb 1, 2016

The erfc function must be quite a bit slower in python than whatever Julia is using because the algorithm is exactly the same in both languages: fill in rows at a time. This will not be consistent with Julia's column major arrays so we are likely getting a performance hit by not "transposing" the algorithm in Julia.

Anyway, I would be happy merging either this or Chase's function. I'm indifferent between the two.

Also agree with @albop that searching for milliseconds here probably doesn't need to be a priority. I just didn't like waiting 35+ seconds to initialize a model when I knew it could take under a second ;)

@cc7768
Copy link
Member

cc7768 commented Feb 1, 2016

I have slight preference for the numbaized function. Cleaner and the difference is pretty negligible (and sounds like it will get better for free).

@albop
Copy link
Contributor

albop commented Feb 1, 2016

Yes, @cc7768, it will get better over time.
Just for fun: one can use the libm function directly using:

from cffi import FFI
ffi = FFI()
ffi.cdef('double erfc(double x);')
libm = ffi.dlopen("m")
erfc = libm.erfc

On my machine, this test produces the following timing:

cpython: 2.5553548336029053
libm: 0.09564900398254395

It is a bit more than 20 times faster than the cpython version !

@albop
Copy link
Contributor

albop commented Feb 1, 2016

With the libm version of erfc the numbaized version goes from 88ms to 25ms on my computer.

Just to be clear: my vote goes to keep the numbaized version exactly as it is and not bother with including more optimization.
This is essentially a numba "bug". There has been an ongoing disussion about the relative performances of math. and numpy. functions here numba/numba#1616 (see also: https://williamjshipman.wordpress.com/2016/01/12/whats-faster-in-numba-jit-functions-numpy-or-the-math-package/). We should just make sure that they are aware of potential differences with libm.

@sglyon
Copy link
Member Author

sglyon commented Feb 1, 2016

Thanks for checking out those benchmarks.

I think that as long as @jstac is ok with this, we can merge as is

@jstac
Copy link
Contributor

jstac commented Feb 1, 2016

Thanks all for the discussion. I've been following and I've learned from reading it. It makes sense to Numba-fy the routine and I agree that we don't need to search for the last drop of speed here.

As a general comment for this PR and others, you guys are in charge of QE.py and QE.jl so please feel free to merge without my input. They are really your libraries now, and you are better equipped to take the lead than I am.

If it's a change that affects the lectures, I'd appreciate if you let me know (although it's not actually your responsibility, and in any case I follow the discussions in the PRs). It will be the job of Tom and I to keep the lectures up to date with changes that you guys make to the libraries.

For QuantEcon.applications, Tom and I still need to take charge of changes there, since that repo just serves the lectures at this stage.

sglyon added a commit that referenced this pull request Feb 2, 2016
@sglyon sglyon merged commit 9ff954c into master Feb 2, 2016
@sglyon
Copy link
Member Author

sglyon commented Feb 2, 2016

Sounds great, thanks @jstac!

@mmcky
Copy link
Contributor

mmcky commented Feb 2, 2016

@spencerlyon2 thanks for this update. Do you mind if we delete the branch?

@sglyon
Copy link
Member Author

sglyon commented Feb 2, 2016

Not at all. Let's keep things tidy

// Spencer
On Feb 1, 2016 21:53, "mmcky" [email protected] wrote:

@spencerlyon2 https://github.com/spencerlyon2 thanks for this update.
Do you mind if we delete the branch?


Reply to this email directly or view it on GitHub
#227 (comment)
.

@mmcky mmcky deleted the sl/numba-tauchen branch February 2, 2016 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants