-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New branch focussing on using in-place memory operations where possible? #3
Comments
Thank you for the tip, sure go ahead! |
Note that this is a long-term thing, once Sten, Peter and I got some of the more pressing Loom issues sorted out ;) Here's an article with optimisation tips for Numba and Numpy, might contain some useful stuff: |
So I have been educating myself on the various Profiler tools for python, to see where the bottlenecks of the current Backspin implementation might be. First, That's a very wide graph with so many irrelevant function calls, so I'll lift out the main bottlenecks: Feature-selection and CEF-tools eat up a good chunk of the time - perhaps something worth looking into as well. What's interesting is that |
So now we got the big bottlenecks, let's look at those functions in detail. This is a good introduction article, on profiling a python program line-by-line, although I couldn't get meaningful results out of the memory profiler. This text file has the full results of running the
From this we learn the following:
So now we know where to start looking for faster alternatives! |
First attempts:
Results:
|
TLDR: Python implementation of BackSPIN algorithm is 40% faster! (not counting feature selection and reading in the CEF files) Important: the results will be different after this rewrite! The reason is that I use a basic floating point optimisation trick where instead of dividing by a matrix by a scalar value
Next up, I'll look at |
This one magic function in Numpy might fix the readCEF troubles: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.loadtxt.html#numpy.loadtxt EDIT: bit of digging shows that we want to use |
@gioelelm, I just had a look at this again because of that image unshredding; could you do a quick test if my version still produces sensible results and if so, merge it? I already pulled in your changes to use centre of mass instead of means, so there should be no conflicts. In my tests it's 20% faster so probably worth the five minutes required to check and merge it. |
Hey Gio/Sten,
Thanks for writing this, it explains a lot to me that the four lines of maths-code in the SPIN paper did not.
However, I noticed that a lot, if not all, of the matrix operations in the code produce new matrices that are only used once. That is wasteful, especially if the operations are simple enough to be memory bound (multiplication, addition). The effect is also stronger the bigger the matrix used, which I guess is true for the new dataset.
For example, the code is full of lines such as:
This is in lines 130-131, and inside an inner loop. Per the documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
Since I'm not entirely sure what those lines do at the moment I'm not sure how to rewrite them, but I am sure they can be rewritten to make good use of numpy views and reduce memory allocation/garbage collection.
Another example is on line 97:
Per the numpy docs this produces a new matrix each time - if we pre-allocate
mismatch_score
and reuse it (assuming the dimensions are constant, and they appear to be) we can do:This would save the repeated memory allocation (and eventual garbage collection); since this function is also called inside an inner loop and the comment above it complains that it gets quite slow, this might have quite a big effect.
One last example from lines 62-66:
I haven't tested this, but I based on the documentation I think this can be rewritten to:
This would reduce the number of allocated matrices from six to two. The biggest downside would be that the new code is less readable - so I'd annotate it to explain the unoptimised operations in the comments.
If you think it's a good use of my time I would like to make an "in-place operations" branch where I'll try to optimise all of the low-hanging fruit here - I suspect it could speed up the code quite a bit, and even if it turns out not to make too much of a difference it will help me get a feeling for the BackSPIN algorithm.
The text was updated successfully, but these errors were encountered: