Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current state of performance #610

Open
nickdrozd opened this issue Aug 28, 2018 · 6 comments
Open

Current state of performance #610

nickdrozd opened this issue Aug 28, 2018 · 6 comments

Comments

@nickdrozd
Copy link
Contributor

Here is a yappi graph of Pylint running against pycodestyle.py:

yappi

The numbers in the boxes are (1) the percentage of time spent in that function including its subcalls, (2) the percentage of time spent in that function excluding subcalls, and (3) the number of times the function is called.

The boxes are colored according to (2), with "hotter" colors taking more time and "cooler" colors taking less. Red is bad!

Note that the graph will be influenced to some degree by the targeted file, and could be different for different targets (larger repos, code with heavy use of certain features, etc).

This isn't a specific issue, but I didn't want to hijack another issue with this, and it seems like something that would be of interest to everyone.

@PCManticore
Copy link
Contributor

PCManticore commented Aug 31, 2018

Thanks for doing this @nickdrozd I appreciate it.
The two boxes seems to be the _get_return_nodes_skip_functions, which I wonder if can be optimised further, and wrapper which actually comes from the decorator path_wrapper used by inference.

Also this is probably a good issue to keep track of the performance planning and status. We already have a project related to performance on pylint's side, but we didn't have yet a place to discuss potential avenues for improving astroid's and pylint's performance.

@kodonnell
Copy link

Nice @nickdrozd - can you give the exact programs/commands used? I think I tried to replicate your graphs for some of the stuff I was working on, but couldn't figure out the visualizing.

Also (as I'm sure you're both aware) it'd be great for this to be integrated into a build pipeline with the graphs published publicly somewhere etc.

@nickdrozd
Copy link
Contributor Author

@kodonnell Take a look at nickdrozd/pylint@dad6955

I don't know if that's the best way to do it, but it works reasonably well.

Here's an unsolicited piece of advice for doing performance work: DO NOT MEASURE TIMES WITH THE PROFILER RUNNING. The profiler makes everything run a lot slower, so the times measured will be way off. I've made this mistake! Heed my warning!

(I realize that there is a time command in that yappi.sh script, but that's just for personal reference. That time should not be used for anything important, like reporting on changes.)

@kodonnell
Copy link

kodonnell commented May 11, 2019

So, I keep coming back to performance, so here are some ideas/summaries since that's what this issue is for.

However, the thing that makes this hard is quantifying performance - if someone wants to try to improve performance, they have to figure out how to test it themselves too. So, I'd like two things:

  • An easy script for people to e.g. create @nickdrozd 's graphs, and other suchlike. This would also require some standardised test bits of code (which might get messy if we want to test e.g. external libraries etc.)
  • Something like airspeed velocity for tracking the performance of a project over time. I don't know much about it, but it's how e.g. numpy tracks this sort of thing.

@nickdrozd @PCManticore - does the above approach seem reasonable? It needs a bit more fleshing out (e.g. which python version and architecture etc.) but if you approve of the general idea, then it seems worthwhile to me.

EDIT: finally remembered airspeed velocity, and since it's the better tool, removed reference to pytest-benchmark.

@kodonnell
Copy link

(Also, it might pay to cross-post the above into pylint as well, as that's generally where users care most about performance of astroid.)

@kodonnell
Copy link

Another question - has anyone considered parallelism for performance, etc.? Since we're largely CPU bound (?), I don't think we'd get much benefit from standard threading (GIL etc.) so we'd have to do something else. This feels potentially 'easy' (with a queue of dependent inference tasks etc.) but that's probably very naive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants