Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: load plotting entrypoint only when necessary #41503

Merged
merged 10 commits into from
Jun 4, 2021

Conversation

TLouf
Copy link
Contributor

@TLouf TLouf commented May 16, 2021

pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
@simonjayhawkins simonjayhawkins added Performance Memory or execution speed performance Visualization plotting labels May 17, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some way to actually test this?

pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
@TLouf
Copy link
Contributor Author

TLouf commented May 17, 2021

is there some way to actually test this?

A first thing I see in test_backend is that test_setting_backend_without_plot_raises does not test _get_plot_backend directly but via set_option, and does not test with an entry point without a .plot method, only for unregistered modules, because it wasn't checked in the code either.

A second is to test whether _get_plot_backend("matplotlib") returns pandas.plotting._matplotlib.

@jreback
Copy link
Contributor

jreback commented May 17, 2021

how do we know that this is better perf wise? we don't have any registered backends in pandas tests (I think). I am very leary of changing this as last time this broke things for a while as its pretty fragile. If we have some testsing & perf benchmarks would feel better here.

@TLouf
Copy link
Contributor Author

TLouf commented May 18, 2021

I've added a benchmark that runs _get_plot_backend with many (1000) dummy backends added in the setup. That's the way I found to have some loading time, another would be to add a dummy but slow to load backend, which I don't know how to make.

Here's the output from asv continuous -f 1.1 master entry-pt-lazy-load -b plotting.BackendLoading:

       before           after         ratio
     [b2a36bdf]       [76a912ec]
     <master>         <entry-pt-lazy-load>
-      2.20±0.6μs        159±0.5ns     0.07  plotting.BackendLoading.time_get_plot_backend`

I'm surprised by how low the measured times are, when I run the same thing outside of asv I get times of the order of ~100ms. I'm very new to benchmarking so let me know if you can explain this, especially if it's because I did something wrong.

Edit: had already tried setting repeat and number to 1 and saw no change, but I wasn't aware of warmup_time! Here are the new results in 6105c6f, which make much more sense:

       before           after         ratio
     [b2a36bdf]       [47130fad]
     <master>         <entry-pt-lazy-load>
-       232±0.2ms       4.00±0.3ms     0.02  plotting.BackendLoading.time_get_plot_backend

@jreback
Copy link
Contributor

jreback commented May 18, 2021

@TLouf 1000 is not realistic, let's see 10

@TLouf
Copy link
Contributor Author

TLouf commented May 18, 2021

It's not realistic in terms of number of backends, but these are also extremely fast to load, as opposed to some real plotting backends. As I mentioned in the issue, hvPlot takes 2s to load on my machine, and in the current implementation, if it is present in the environment, it will be loaded, even if the user plots with matplotlib. So the performance improvement highly depends on what backends have been installed.

Anyway, here's the result with 10:

       before           after         ratio
     [b2a36bdf]       [47130fad]
     <master>         <entry-pt-lazy-load>
-       10.9±0.1ms       3.29±0.5ms     0.30  plotting.BackendLoading.time_get_plot_backend

asv_bench/benchmarks/plotting.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments, pls rebase as well

asv_bench/benchmarks/plotting.py Outdated Show resolved Hide resolved
pandas/plotting/_core.py Outdated Show resolved Hide resolved
@TLouf
Copy link
Contributor Author

TLouf commented Jun 2, 2021

Done. I had to add a boolean found_backend in _load_backend to make mypy happy, because initialising module to None was incompatible with the return type, which is necessarily a module, as the function raises when no backend is found.

pandas/plotting/_core.py Show resolved Hide resolved
# Fall back to unregistered, module name approach.
try:
module = importlib.import_module(backend)
found_backend = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i meant is that you can move all of this code here to L1766 (e.g. just load and import there) right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I first have to iterate through all entry points before falling back to importing a module. That's because for instance, the hvPlot backend is an entry point whose name is "holoviews" (see https://github.com/holoviz/hvplot/blob/master/setup.py#L134). So if I move up the unregistered entry point approach, in that case I might import holoviews, which has no top level plot method, thus raising when the entry point is actually available.

@jreback jreback added this to the 1.3 milestone Jun 4, 2021
@jreback jreback merged commit e602f7b into pandas-dev:master Jun 4, 2021
@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

thanks @TLouf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: load plotting entry point only when necessary
3 participants