-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Naive cythonization for performance #606
Conversation
@PCManticore if you could provide some feedback on the questions above (I'll update them with answers as they're provided), and maybe CC any other relevant people for their ideas, that'd be great. |
Any comments @PCManticore? I was awaiting some feedback/direction before proceeding. |
Going to check this PR this week, sorry for the waiting! Feel free to proceed for now. |
I've been meaning to take a look at this, but unfortunately I'm very busy just right now. I have three preliminary thoughts:
|
Thanks @PCManticore - I'm still unsure of what direction to take, so I'll await your comments. @ceridwen FYI that it's very little code, so nothing really to review. I've basically just compiled the existing python-only code with cython - no C specifics.
Agreed - see my last point in the PR. FYI Cython code can just be pure python (which is this PR - the code is still python - except for some minor tweaks like dynamic method allocation and removing Also - how many people contribute to the core of astroid already? (I.e. if not many people make major changes then it's less of an argument.)
Good point. How many people use PyPy for running this? Seems like a relatively small minority. And if they're only using it to run pylint then it's not a major - that's a post-processing task etc. If they're only using PyPy for performance, then they might not need to anyway if this approach is faster anyway.
Totally agreed - see my comment in the original PR.
Nope - I didn't know about that.
This is somewhat in favor of using Cython, as in the last paragraph of my original PR comment. Current performance optimizations have to be rather clever and often sacrifice maintainability/interpretability for (often minor) speed improvements. With Cython, we wouldn't have to worry about that. Specifically to your point on this PR (in it's current state) - it's very unlikely there will be regressions with the PR as it is, since it's performance depends only on Cython (which is well tested etc.) and not code changes.
Me too. But a 20% gain without me even understanding the code is pretty nice. I'll probably leave the smarter stuff to those more capable.
No idea - moving to Cython is a big change, so this PR is more of a "look what's possible - what do you think we should do?" piece. I'm not the best person to make these decisions (by a long way) - I'm just hopefully proposing something which others can build on. |
I'm not familiar with Cython. What exactly is it doing to get the speedup? Will it require messing with the code in a substantial way? In any case, this is a cool approach. A while back I said "It would certainly be nice if a bird flew by and dropped Pylint-rewritten-in-C into our hands, but short of that I doubt it will ever happen", so it's nice to see that it happened 🐈 |
No idea sorry - I'm no expert in Cython. If you want, you can check out the C code it generates ...
Check out the PR - there are very few code changes. Otherwise see my comments in the original PR around how much the code can/should change.
I remember reading this. Note that it's not quite true - I haven't rewritten anything in C. Nearly all the files will run with normal python just fine. I could even use the more traditional method for dynamically setting methods on classes, and then all the code would still run in pure python. However, if you do compile it with Cython (which does convert it to C along the way), it'll run faster. |
@PCManticore did you ever get a chance to review this? I could possibly close this otherwise. |
Thanks @kodonnell I haven't had a chance to review this closely up until now. But now I am not entirely convinced that this is the way to go in terms of improving the performance of astroid. I agree with the general sentiment of @ceridwen that this will mean both less contributors that what we already have and dropping support for Python, for a somewhat minor increase in performance. We also don't have that much experience with Cython and I am afraid on jumping on magic performance improvement tools without a modicum of experience to be able to debug any unexpected side effect. And finally, the proposed future changes seems to require a lot of work in both implementation and maintaining astroid post cythonization, which I seriously doubt we have time for. All in all, I think it's best to leave astroid pure Python for on and add corresponding performance improvements with Python code. |
Thanks @PCManticore. I tend to disagree that small/minor cythonization will affect contributions, or that it's a "magic ... tool" (as it's used in quite a few popular projects), or that the performance may be minor (as I didn't even try to optimise anything ... it might be the case that we could get a 10x performance increase by tweaking a few lines of code). However, I'm happy to close this as it probably doesn't make much sense to pursue this without at least a core contributor being more familiar with Cython. For anyone reading this later - if you are a Cython ninja, I'd be really interested to see what some slightly-more-than-naive Cythonizing could do. |
WIP - mainly for my testing at the moment, and submitted as a PR to allow for discussion/review etc. Do not expect the code to be pretty and up to standard as yet, etc.
As proposed and validated in the pylint issue, it could be possible to 'naively' cythonize the astroid codebase for performance. With the changes up to c9fef18, the results are as follows:
master
naive_cython
So roughly going from 6s to 5s. That's not bad for 'free' (especially on something that can run for hours). NB:
Tasks:
cdef int i
etc. scattered around the show. If we could get a Cython expert to run their eye over things, they'll probably find a bunch of other easy naive optimizations. My feeling is that if we're moving to Cython anyway, we might as well make a reasonably compelling case for it, and get the low-hanging fruit before we release it.*.py
to*.pyx
pylint
on a bunch of sample projects, to ensure it's behaving the same. We could compare performance too. (NB: this could then form the basis of a performance regression suite etc.)Future:
cdef
s for functions etc. which will be faster ... though possible not sufficiently so to justify it)