Current state of performance #610

nickdrozd · 2018-08-28T00:23:21Z

Here is a yappi graph of Pylint running against pycodestyle.py:

The numbers in the boxes are (1) the percentage of time spent in that function including its subcalls, (2) the percentage of time spent in that function excluding subcalls, and (3) the number of times the function is called.

The boxes are colored according to (2), with "hotter" colors taking more time and "cooler" colors taking less. Red is bad!

Note that the graph will be influenced to some degree by the targeted file, and could be different for different targets (larger repos, code with heavy use of certain features, etc).

This isn't a specific issue, but I didn't want to hijack another issue with this, and it seems like something that would be of interest to everyone.

The text was updated successfully, but these errors were encountered:

PCManticore · 2018-08-31T08:01:42Z

Thanks for doing this @nickdrozd I appreciate it.
The two boxes seems to be the _get_return_nodes_skip_functions, which I wonder if can be optimised further, and wrapper which actually comes from the decorator path_wrapper used by inference.

Also this is probably a good issue to keep track of the performance planning and status. We already have a project related to performance on pylint's side, but we didn't have yet a place to discuss potential avenues for improving astroid's and pylint's performance.

kodonnell · 2018-08-31T19:06:52Z

Nice @nickdrozd - can you give the exact programs/commands used? I think I tried to replicate your graphs for some of the stuff I was working on, but couldn't figure out the visualizing.

Also (as I'm sure you're both aware) it'd be great for this to be integrated into a build pipeline with the graphs published publicly somewhere etc.

nickdrozd · 2018-08-31T22:08:11Z

@kodonnell Take a look at nickdrozd/pylint@dad6955

I don't know if that's the best way to do it, but it works reasonably well.

Here's an unsolicited piece of advice for doing performance work: DO NOT MEASURE TIMES WITH THE PROFILER RUNNING. The profiler makes everything run a lot slower, so the times measured will be way off. I've made this mistake! Heed my warning!

(I realize that there is a time command in that yappi.sh script, but that's just for personal reference. That time should not be used for anything important, like reporting on changes.)

kodonnell · 2019-05-11T08:28:58Z

So, I keep coming back to performance, so here are some ideas/summaries since that's what this issue is for.

Improving inference caching (unnecessary context clones?) #529
I filed this in pylint, but may be more appropriate here.
WIP: Naive cythonization for performance #606

However, the thing that makes this hard is quantifying performance - if someone wants to try to improve performance, they have to figure out how to test it themselves too. So, I'd like two things:

An easy script for people to e.g. create @nickdrozd 's graphs, and other suchlike. This would also require some standardised test bits of code (which might get messy if we want to test e.g. external libraries etc.)
Something like airspeed velocity for tracking the performance of a project over time. I don't know much about it, but it's how e.g. numpy tracks this sort of thing.

@nickdrozd @PCManticore - does the above approach seem reasonable? It needs a bit more fleshing out (e.g. which python version and architecture etc.) but if you approve of the general idea, then it seems worthwhile to me.

EDIT: finally remembered airspeed velocity, and since it's the better tool, removed reference to pytest-benchmark.

kodonnell · 2019-05-11T08:30:12Z

(Also, it might pay to cross-post the above into pylint as well, as that's generally where users care most about performance of astroid.)

kodonnell · 2019-05-12T10:01:26Z

Another question - has anyone considered parallelism for performance, etc.? Since we're largely CPU bound (?), I don't think we'd get much benefit from standard threading (GIL etc.) so we'd have to do something else. This feels potentially 'easy' (with a queue of dependent inference tasks etc.) but that's probably very naive.

PCManticore added topic-performance Discussion 🤔 labels Aug 31, 2018

PCManticore mentioned this issue Aug 31, 2018

Pylint slow when run on script with pandas pylint-dev/pylint#2198

Closed

robin-wayve mentioned this issue Aug 2, 2021

Astroid calls to ancestors are uncached and slow for templates and generics in ClassDef.ancestors #1115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current state of performance #610

Current state of performance #610

nickdrozd commented Aug 28, 2018

PCManticore commented Aug 31, 2018 •

edited

Loading

kodonnell commented Aug 31, 2018

nickdrozd commented Aug 31, 2018

kodonnell commented May 11, 2019 •

edited

Loading

kodonnell commented May 11, 2019

kodonnell commented May 12, 2019

Current state of performance #610

Current state of performance #610

Comments

nickdrozd commented Aug 28, 2018

PCManticore commented Aug 31, 2018 • edited Loading

kodonnell commented Aug 31, 2018

nickdrozd commented Aug 31, 2018

kodonnell commented May 11, 2019 • edited Loading

kodonnell commented May 11, 2019

kodonnell commented May 12, 2019

PCManticore commented Aug 31, 2018 •

edited

Loading

kodonnell commented May 11, 2019 •

edited

Loading