Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba improvements #144

Merged
merged 35 commits into from
May 7, 2015
Merged

Numba improvements #144

merged 35 commits into from
May 7, 2015

Conversation

sglyon
Copy link
Member

@sglyon sglyon commented May 4, 2015

I'm just opening this as a PR so that it is easy to see what has been changed. We can still push to the branch to make more changes to the PR.

I have no intention of merging this myself.

oyamad and others added 30 commits November 28, 2014 12:34
`numba_installed` flag introduced
Following the issue and fix numba/numba#1103 and numba/numba#1104
Test should fail for this commit
For use with Numba <= 0.18.2
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.06%) to 89.37% when pulling 89302ec on numba_improvements into 8a19047 on master.

@oyamad
Copy link
Member

oyamad commented May 4, 2015

@spencerlyon2 Thanks for issuing the PR!

Summary of changes and issues to discuss:

  1. gth_solve.py: The main part is jitted in a new function called _gth_solve_jit with nopython mode; if not numba_installed, the previous NumPy version is executed. The test does not cover the NumPy version. See GTH_SOLVE: Numba version #103 for details (this in particular).
  2. mc_tools.py: mc_sample_path is autojitted if numba_installed; a NumPy version mc_sample_path_numpy is also implemented, which overwrites mc_sample_path if not numba_installed. The only difference between the two version is that the former uses the custom searchsorted, which is contained in utilities.py, while the latter uses np.searchsorted. See Numba version of mc_sample_path #137 for details.
  3. utilities.py: Contains searchsorted, which is a jitted and simplified version of np.searchsorted; it is an alias of np.searchsorted if not numba_installed. This function will no longer be need and should be discarded once np.searchsorted is supported by Numba.
  4. numba_import_fail_message is in common_messages.py; numba_installed is defined with try from Numba import jit except ... in external.py.
  5. Issues on cartesian are to be discussed in [Update] Cartesian Module  #143.
  6. The issue of keeping/preparing a NumPy counterpart for each routine implemented with Numba, one of the main issues in Numpy version to fall back on in lss.py #132. Will we drop support for NumPy version sometime in the future, and if so, at what occasion? (I feel somehow uneasy when I see the duplications...)
  7. These may belong to other PRs:
    1. Possible changes in the API of MarkovChain/mc_sample_path: see this and this.
    2. Support for sparse matrices in MarkovChain (I have code for mc_sample_path but not yet for gth_solve).
  8. This is not really important, but the "coverage" as displayed on the top page gets decreased as we write more Numba implemented code, since the coverage module inspects the jitted functions to not covered by the tests. See Code coverage of jitted functions - Google Groups.

@mmcky
Copy link
Contributor

mmcky commented May 5, 2015

@oyamad Excellent summary. I think most of these discussions points are captured in other open issues now - so I don't see any reason not to merge this in. Do you?

Point 7 (i). I have opened an issue to track this (#146) based on content in #137
Point 7 (ii). I have opened an issue to track this (#145).

Point 8. Interesting issue - I agree not entirely important at this stage - but good to be aware of. Thanks for pointing this out.

@oyamad
Copy link
Member

oyamad commented May 5, 2015

@mmcky thanks.

I still have a concern about the duplication in mc_tools.py. I think mc_sample_path_numpy is not necessary and the following is sufficient:

def mc_sample_path(P, init=0, sample_size=1000):
    """
    See Section: DocStrings below
    """
    n = len(P)

    # CDFs, one for each row of P
    cdfs = np.empty((n, n), order='C')  # see issue #137#issuecomment-96128186
    np.cumsum(P, axis=-1, out=cdfs)

    # Random values, uniformly sampled from [0, 1)
    u = np.random.random(size=sample_size)

    # === set up array to store output === #
    X = np.empty(sample_size, dtype=int)
    if isinstance(init, int):
        X[0] = init
    else:
        cdf0 = np.cumsum(init)
        X[0] = searchsorted(cdf0, u[0])

    # === generate the sample path === #
    for t in range(sample_size-1):
        X[t+1] = searchsorted(cdfs[X[t]], u[t+1])

    return X

if numba_installed:
    mc_sample_path = jit(mc_sample_path)

If not numba_installed, searchsorted works as np.searchsorted (with side='right'), as defined in utilities.py.

@mmcky
Copy link
Contributor

mmcky commented May 5, 2015

@oyamad I see what you are saying now - thanks. I think this is a good idea. Effectively it boils down to moving the numpy vs numba comparison down a level to the now internal searchsorted utility functions. So we should update the test to compare jitted searchsorted with the numpy searchsorted (with side="right") to ensure both give the same result over time. In your implementation above the rest of mc_sample_path is exactly the same logically jitted or not.

The only downside I see is in this implementation we won't be able to check the function mc_sample_path final output between the jit version and the non jit version easily - but that would only effectively be checking for jit errors which are probably less of an issue versus logic errors etc.

@jstac I will leave the final decision with you. But @oyamad suggestion does reduce duplication in this special case.

Performance:
I had a quick look at performance (http://nbviewer.ipython.org/github/mmcky/work-notebooks/blob/master/quantecon/PerformanceCheckMCTools.ipynb). Sorry didn't suppress %timeit text but the graphs in Out[7] and Out[15] suggests that numba function is best for anything greater than a sample size of ~70. localB is what would happen in the case of no numba installed and is the slowest. Interestingly the non-numpy looped version of searchsorted runs faster even in pure python which is localA. I am not sure why this is.

@oyamad
Copy link
Member

oyamad commented May 5, 2015

@mmcky Thank you for the performance comparison. It is very interesting.
I suspect that qe.numpy and localB would be faster than localA for matrices of larger size.

@mmcky
Copy link
Contributor

mmcky commented May 5, 2015

@oyamad Yeah I think you are right. Last night I had a quick look in our library for a simple utility for producing a Transition Matrix of size n. I didn't see one - I could produce some performance checks through the matrix dimensions this afternoon.

@mmcky
Copy link
Contributor

mmcky commented May 6, 2015

@oyamad Indeed. I added a really simple test which is just using a random matrix P with each row normalised to 1. localB is much better than localA. (http://nbviewer.ipython.org/github/mmcky/work-notebooks/blob/master/quantecon/PerformanceCheckMCTools.ipynb)

@jstac
Copy link
Contributor

jstac commented May 6, 2015

@oyamad @mmcky You've done a lot of work. It's looking really good. I like the look of the code and I'm keen to merge this.

I agree with @oyamad 's suggestion about cutting mc_sample_path_numpy. There's no need for it. As long as the home grown version of searchsorted is properly tested --- which it is --- there's no need to keep mc_sample_path_numpy just for the sake of testing.

So I propose that we adopt @oyamad 's suggestion, cut this function, and then merge.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.05%) to 89.38% when pulling 008231c on numba_improvements into 8a19047 on master.

@oyamad
Copy link
Member

oyamad commented May 6, 2015

I deleted mc_sample_path_numpy and test_mc_sample_path_functions. Now no test is there for mc_sample_path.

@jstac
Copy link
Contributor

jstac commented May 7, 2015

Thanks for all this work. I'll merge now and at the same time open an issue to add a test for mc_sample_path. Great to have this merged!

jstac added a commit that referenced this pull request May 7, 2015
@jstac jstac merged commit 4ee517e into master May 7, 2015
@jstac jstac deleted the numba_improvements branch May 7, 2015 07:24
@sglyon
Copy link
Member Author

sglyon commented May 7, 2015

Alt Text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants