Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Define API for pandas plotting backends #26747

Open
datapythonista opened this issue Jun 9, 2019 · 44 comments
Open

API: Define API for pandas plotting backends #26747

datapythonista opened this issue Jun 9, 2019 · 44 comments
Labels
API Design Needs Discussion Requires discussion from core team before further action Visualization plotting

Comments

@datapythonista
Copy link
Member

In #26414 we splitted the pandas plotting module into a general plotting framework able to call different backends and the current matplotlib backends. The idea is that other backends can be implemented in a simpler way, and be used with a common API by pandas users.

The API defined by the current matplotlib backend includes the objects listed next, but this API can probably be simplified. Here is the list with questions/proposals:

Non-controversial methods to keep in the API (They provide the Series.plot(kind='line')... functionality):

  • LinePlot
  • BarPlot
  • BarhPlot
  • HistPlot
  • BoxPlot
  • KdePlot
  • AreaPlot
  • PiePlot
  • ScatterPlot
  • HexBinPlot

Plotting functions provided in pandas (e.g. pandas.plotting.andrews_curves(df))

  • andrews_curves
  • autocorrelation_plot
  • bootstrap_plot
  • lag_plot
  • parallel_coordinates
  • radviz
  • scatter_matrix
  • table

Should those be part of the API and other backends should also implement them? Would it make sense to convert to the format .plot (e.g. DataFrame.plot(kind='autocorrelation')...)? Does it make sense to keep out of the API, or move to a third-party module?

Redundant methods that can possibly be removed:

  • hist_series
  • hist_frame
  • boxplot
  • boxplot_frame
  • boxplot_frame_groupby

In the case of boxplot, we currently have several ways of generating a plot (calling mainly the same code):

  1. DataFrame.plot.boxplot()
  2. DataFrame.plot(kind='box')
  3. DataFrame.boxplot()
  4. pandas.plotting.boxplot(df)

Personally, I'd deprecate number 4, and for number 3, deprecate or at least not require a separate boxplot_frame method in the backend, but try to reuse BoxPlot (for number 3 comments, same applies to hist).

For boxplot_frame_groupby, didn't check in detail, but not sure if BoxPlot could be reused for this?

Functions to register converters:

  • register
  • deregister

Do those make sense for other backends?

Deprecated in pandas 0.23, to be removed:

  • tsplot

To see what each of these functions do in practise, it may be useful this notebook by @liirusuk: https://github.com/python-sprints/pandas_plotting_library/blob/master/AllPlottingExamples.ipynb

CC: @pandas-dev/pandas-core @tacaswell, @jakevdp, @philippjfr, @PatrikHlobil

@datapythonista datapythonista added Visualization plotting API Design Clean Needs Discussion Requires discussion from core team before further action labels Jun 9, 2019
@TomAugspurger
Copy link
Contributor

I think keep things like autocorrelation out of the swappable backend API.

I think we’ve left things like df.boxplot and hist around because they have slightly different behavior than the .plot API. I wouldn’t recommend making them part of the backend API.

@TomAugspurger
Copy link
Contributor

Here’s my start on a proposed backend API from a few months ago: TomAugspurger@b07aba2

@datapythonista
Copy link
Member Author

I think it's worth mentioning that at least hvplot (didn't check the rest) does already provide the functions like andrews_curves, scatter_matrix, lag_plot,...

May be if we don't want to force all backends to implement those, we can check if the selected backend implements them, and default to the matplotlib plots?

I assumed boxplot and hist behaved exactly the same, but just had shortcuts Series.hist() for Series.plot.hist(). The "shortcut" shows the plot grid, but other than that I haven't seen any difference.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 10, 2019 via email

@datapythonista
Copy link
Member Author

I think that makes sense, but if we do that, I think we should move them to pandas.plotting.matplotlib.andrews_curves, instead of pandas.plotting.andrews_curves.

@TomAugspurger I need to check in more detail, but I think the API you implemented in TomAugspurger@b07aba2 is the one that makes more sense. I'll work on it once I finish #26753. I'll also experiment on whether it's feasible to move andrews_curves, scatter_matrix... to the .plot() syntax, I think that will make things simpler and easier for everyone (us, third-party libraries, and users).

@jakevdp
Copy link
Contributor

jakevdp commented Jun 10, 2019

What's the intention here regarding extra kwargs passed to plotting functions? Should additional backends attempt to duplicate the functionality of all matplotlib-style plot customizations, or should they allow keywords to be passed that correspond to those used by the particular backend?

The first option would be nice in theory, but would require every non-matplotlib plotting backend to essentially implement its own matplotlib conversion layer with a long tail of incompatibilities that would essentially never be complete (speaking from experience as someone who tried to create mpld3 some years back).

The second option is not as nice from the perspective of interchangeability, but would allow other backends to be added with a more reasonable set of expectations.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 10, 2019 via email

@ghost
Copy link

ghost commented Jun 14, 2019

I'm sorry if this is a stupid question, but If you define a plotting "API" which is basically a group of canned plots, wouldn't every backend produce more or less the same output? what new capability is this meant to enable? something like a pandas to vega exporter perhaps?

@jakevdp
Copy link
Contributor

jakevdp commented Jun 14, 2019

I don't think it's correct to say that every backend produces more or less the same output.

For example, matplotlib is really good at static charts, but not great at producing portable interactive charts.

On the other hand, bokeh, altair, et al. are great for interactive charts, but aren't quite as mature as matplotlib for static charts.

Being able to produce both with the same API would be a big win.

@tacaswell
Copy link
Contributor

The first option would be nice in theory, but would require every non-matplotlib plotting backend to essentially implement its own matplotlib conversion layer with a long tail of incompatibilities that would essentially never be complete (speaking from experience as someone who tried to create mpld3 some years back).

and also pins Matplotlib down even more than we already are API wise. I think it makes sense for pandas to declare what style knobs it wants to expose and expect the backend implementations to sort out what that means. This may mean not blindly passing **kwargs through and instead ensuring that the returned objects are "the right thing" for the given backend to be able to do after-the-fact style customization.

@ghost
Copy link

ghost commented Jun 15, 2019

For example, matplotlib is really good at static charts, but not great at producing portable interactive charts.

Thanks @jakevdp, yes, supporting interactive charts is a good goal.

Before things go too far down this particular avenue, here's an alternative solution.

Instead of proclaiming the pandas plotting API to now be a specification, and asking viz packages to implement it specifically, why not generate an intermediate representation (like a vega JSON file) of the plot, and encourage backends to target that as their input.

Advantages include:

  1. Not being tied to the expressive power of a reified pandas API, which wasn't designed as a specification.
  2. The work done by plotting packages to support pandas, becomes available to other pydata packages which generate IR.
  3. Promoting a common language for interchange visualization in the pydata space
  4. Which makes new tool more powerful because more widely applicable
  5. Which makes the effort of writing them more reasonable. Basically, improved incentives.

Vega/Vega-lite, as a modern, established, open, and JSON-based viz specification language, several man-years put it into its design and implementation, and existing tools built around it, seems like it was created expressly for this purpose. (just please don't).

You know, frontend->IR->backend, like compilers are designed.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 15, 2019 via email

@datapythonista
Copy link
Member Author

We now merged #26753, and the plotting backend can be changed from pandas. When we split the matplotlib code we left the SeriesPlotMethods and FramePlotMethods in the pandas (not matplotlib) side. That was mainly to leave the docstrings in the pandas side.

But I see that what backends did was to reimplement those classes. So, currently we expect the backends to have one class per plot (e.g. LinePlot, BarPlot), but instead they implement a class with a plot per method (e.g. hvPlot, or the same names as pandas for pdvega`).

What I think makes sense, at least as a first version, is that we implement the API as hvplot and pdvega did. I'd just create an abstract class in pandas, that backends inherit from.

If that makes sense for everyone, I'll start by creating the abstract class and adapting the matplotlib backend we have in pandas, and once this is done, we adapt hvplot and pdvega (the changes there should be quite small).

Thoughts?

@philippjfr
Copy link

What I think makes sense, at least as a first version, is that we implement the API as hvplot and pdvega did. I'd just create an abstract class in pandas, that backends inherit from.

I think that on balance this approach will be cleaner. I can't speak to other plotting backends but at least in hvPlot different plot methods share quite a bit of code, e.g. scatter, line and area are largely analogous, and I'd prefer not to rely on subclassing to share code between them. Additionally, I think different backends should have the option to add additional plot types and exposing those as additional public methods seems like the simplest, most natural approach.

@datapythonista
Copy link
Member Author

Just to make sure I understand, when you say I'd prefer not to rely on subclassing to share code between them you mean like in class LinePlot(MPLPlot), right? And not that you think it's a bad idea to inherit from an abstract base class?

I think I'm +1 on letting backends define plot types not in pandas. But I won't probably implement it right now. We're planning to release pandas in around one week. And I think this will require a bit more thinking than blindly calling the methods of backends if user provides kind='foo' and the backend provides the method foo (for example, parameter validation, or it'll cause that some kind will be in the documentation and some not).

@philippjfr
Copy link

Just to make sure I understand, when you say I'd prefer not to rely on subclassing to share code between them you mean like in class LinePlot(MPLPlot), right? And not that you think it's a bad idea to inherit from an abstract base class?

Yes, that's right. More concretely I'd prefer not to have to do this kind of thing:

class MPL1dPlot(MPLPlot):

    def _some_shared_method(self, ...):
        ...

class LinePlot(MPL1dPlot):
    ...

class AreaPlot(MPL1dPlot):
    ...

Sorry if that was not clear.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 26, 2019

Very much in favor of a simpler API that is publicly exposed as the single function instead of the classes as now proposed in #27009.

General question/remark on how the backend option now works. Assume I am the pdvega developer and make this backend available. That means that if users do pd.options.plotting.backend = 'pdvega', that the pdvega library needs to have a top-level plot function?
1) as a library author, that's not necessarily the function you want to publicly expose (meaning, for the top-level plot method from the library's point of view, it is not necessarily the API that you want your users to use directly) and 2) for this case you might actually want to be able to do pd.options.plotting.backend = 'altair' ? (in case altair developers are fine with that)
So basically my question is: does there need to be a exact 1:1 mapping on the backend name and what is imported? (which is now needed since it simply does an import of that provided backend string).

EDIT: I see that actually something similar was discussed in the PR #26753

@datapythonista
Copy link
Member Author

If we make the decision that pandas doesn't know/limit which backends can be used (which I'm strongly in favor of making), we need to decide on how/what to call in the backends.

What it's been implemented and proposed in the PR I'm working on is that the option plotting.backend is a module (can be pdvega, altair, altair.pandas, or whatever), and that module must have a public plot function, that it's what we will call.

We can consider other options, like if the option is pdvega, we import pdvega.pandas, or we can name the function plot_pandas or whatever. I think the proposed way is the simplest, but if there are other proposals that make more sense, I'm happy to change it.

Another discussion is if we want to force the users to import the backends manually:

import pandas
import hvplot

pandas.Series([1, 2, 3]).plot()

If we do that, the modules can register themselves, they can also register aliases (so set_option can understand other names than the name of the module). They can also implement custom functions or machinery (e.g. context managers) to plot with certain backends,... Personally I think the simpler we keep things the better.

And while it could be nice to do pandas.set_option('plotting.backend', 'bokeh') to plot in bokeh, I think that implies two things I personally don't like:

  • pandas.set_option('plotting.backend', 'bokeh') will only work if import pandas_bokeh has been called, and will be confusing for the users.
  • It also implies that there is only one module to plot in bokeh. Which doesn't need to be true, and gives the wrong impression to users that you're plotting directly with bokeh, and not with a pandas plotting backend for bokeh.

@jreback jreback added the Blocker Blocking issue or pull request for an upcoming release label Jun 28, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone Jun 30, 2019
@jorisvandenbossche
Copy link
Member

@datapythonista thanks for the detailed answer. I am fine with keeping it now as is for the initial release (possibility for alias can always be added later).

If users want hvplot's Andrew's curve plot, they should import the function from hvplot and pass the dataframe there.

+1, I would also not expose all the additional plotting functions through the backend.

But about moving them to pandas.plotting.matplotlib, that seems like an unnecessary backwards incompatible break to me (assuming you meant not only moving the implementation).

@jakevdp
Copy link
Contributor

jakevdp commented Jul 1, 2019

pandas.set_option('plotting.backend', 'bokeh') will only work if import pandas_bokeh has been called, and will be confusing for the users.

If we use entrypoints to register extensions, then this does not have to be the case: having the package installed on the system will register the entrypoint and make it visible to pandas. For example, this is what Altair uses to detect various renderers that the user might have installed.

@jakevdp
Copy link
Contributor

jakevdp commented Jul 1, 2019

Also, for what it's worth, once this goes in I think I'd probably deprecate pdvega and move the relevant code over to a new package named pandas_altair or something similar.

@datapythonista
Copy link
Member Author

Just to explain a bit why things are the way they are now. It's relevant because I'm not quite sure how to implement the changes you propose, or not exposing things in general. Not saying here that it can't be done in a different way, it's just to enrich the discussion.

The first decision was to move all the code using matplotlib to a separate module (pandas.plotting._matplotlib). By doing that, that module somehow became the matplotlib backend.

Everything that was public in pandas.plotting has been kept as public there. And to make things as simple as possible, every one of these functions, once called, it loads the backend (call to _get_plot_backend) and it calls the function there.

The public API for the user has no change at all, users still have the same methods and functions available. We're not exposing anything new.

How I understand things, if we decide that an existing plot like andrew_curves is not delegated to the backend, what this implies is that instead of getting the backend selected by the user, we will still select the matplotlib backend. Given that at least hvplot is already implementing andrew_curves, I personally don't see the point. If the user wants an andrew_curves plot in matplotlib is as easy as not changing the backend (or setting it again if it's been changed). So, with the change what we'd do is simply making users life much harder, by adding extra complexity to pandas.

If we want to be nice with backend developers and not force them to implement plots that may not be so mainstream (I guess that's one of the reasonings?), may be we can default to the matplotlib backend anything that is missing in the selected backend?

About delegating any unknown kind of plot to the backend, I'm -1 on doing it right now. Surely it can make sense eventually. But I think having several plot kinds documented in pandas, and having extra ones that the we don't document, feels a bit hacky. I think it can wait for the next version, after we have feedback on how having different backends work for users, and we have more time to discuss and analyze in detail.

@jorisvandenbossche
Copy link
Member

If the user wants an andrew_curves plot in matplotlib is as easy as not changing the backend (or setting it again if it's been changed). So, with the change what we'd do is simply making users life much harder, by adding extra complexity to pandas.

I don't think we would be making the user's life harder. Instead of importing it from pandas.plotting, if they want a hvplot's version, they can simply import it from there. Which is something not possible for the DataFrame.plot method, as that is defined on the object. For me that is the main reason for the plotting backend.

If we want to be nice with backend developers and not force them to implement plots that may not be so mainstream

For me it is not about being nice or that implementing everything would be required (it is totally fine if a backend does not support all plotting types, IMO), but rather an unnecessary expansion of the plotting backend API, which also ties ourselves to it.
If we would restart pandas from scratch, I don't think those misc plotting types would be included. But with the plotting backend API we are in some way starting something new.

Any other opinions about this?

@TomAugspurger
Copy link
Contributor

Agreed with @jorisvandenbossche.


Just to make sure this isn't lost, I think @jakevdp's suggestion to use setuptool's entry points is worth considering to solve the import order registration issue: #26747 (comment)

@datapythonista
Copy link
Member Author

@jorisvandenbossche how would you change that in the code? Instead of getting the backend defined in the settings for those methods, get the matplotlib backend? I think this is wrong conceptually, but I'm ok with it if there is agreement. Anything that reverts the decoupling of the matplotlib code from the rest I'm -1.

Since you mention that in a pandas from scratch we wouldn't include those plots, should we deprecate them? I'm +1 on moving all the plots that are not methods of Series or DataFrame to a third-party package. Or if any is important enough to be kept, to move it to be called with .plot() as the others.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2019

i would deprecate the non standard plots in pandas
and move to an external package

@TomAugspurger
Copy link
Contributor

Joris is offline for a bit.

I think when we’ve discussed this in the past, his and my position on theses is to just leave them untouched until they become a maintenance burden.

@datapythonista
Copy link
Member Author

Just so we are in the same page, this is a summary of what we have, and my understanding of the state of the discussion:

Used as methods of Series and DataFrame (afaik we're all happy to keep them as they are, delegated to the selected backend):

  • PlotAccessor
  • boxplot_frame
  • boxplot_frame_groupby
  • hist_frame
  • hist_series

Other plots (under discussion whether they should be deprecated, delegated to the matplotlib backend, or delegated to the selected backend):

  • boxplot
  • scatter_matrix
  • radviz
  • andrews_curves
  • bootstrap_plot
  • parallel_coordinates
  • lag_plot
  • autocorrelation_plot
  • table

Other public stuff in pandas.plotting (under discussion too):

  • plot_params
  • register_matplotlib_converters
  • deregister_matplotlib_converters

For the Other plots section, I personally think they are a maintenance burden at this point, and I'm +1 on moving them out of pandas, and deprecate them in 0.25.

For the converters and the other stuff, what we have now is surely not correct, since register_matplotlib_converters delegates to the selected plot, which can not be matplotlib. The options that I guess we can consider are:

  • Rename them to register_converters/deregister_converters, deprecate the current ones, and keep delegating to the backend
  • Move them from pandas.plotting to pandas.plotting.matplotlib (which would imply making the matplotlib backend public, so I wouldn't)
  • Leave them as they are, and delegate to the matplotlib backend instead of the selected backend (I see this more as a hack than a good design decision, I'd prefer to keep pandas.plotting agnostic of which backends exist)

@TomAugspurger
Copy link
Contributor

For the Other plots section, I personally think they are a maintenance burden at this point, and I'm +1 on moving them out of pandas, and deprecate them in 0.25.

How do you find the "other plots" to be a maintenance burden? Looking at the history for the "misc" plots: https://github.com/pandas-dev/pandas/commits/0.24.x/pandas/plotting/_misc.py, we have ~10-15 commits since 2017. The majority are global cleanups applied to the entire codebase (so a small marginal burden). I only see 1-2 commits changing docs, and no commits changing functionality.

Rename them to register_converters/deregister_converters, deprecate the current ones, and keep delegating to the backend

I don't think this would make sense. There are matplotlib-specific converters that we've written for matplotlib. Other backends won't have them. It probably shouldn't be part of the backend API.

@datapythonista
Copy link
Member Author

I didn't mean those plots are a burden because of the amount of maintenance we've got in the last months of years, but because of the problem that they suppose now in having a consistent and intuitive API for users, and a good modular code design for us.

Regarding the converters, I don't know if backend authors may want to implement the equivalent of those for matplotlib in some cases. But doesn't seem a problem if they don't, and those functions do nothing for some or all of the other backends. I'm also ok with option 2, but I don't find it as neat.

@TomAugspurger
Copy link
Contributor

but because of the problem that they suppose now in having a consistent and intuitive API for users, and a good modular code design for us.

They're already somewhat inconsistent with DataFrame.plot, though. The name "misc" implies that :) Does having a swappable backend make that any worse? To the extent that it's worth the churn on user code? I don't think so.

I don't know if backend authors may want to implement the equivalent of those for matplotlib in some cases.

I don't think so. The point of those converters is to teach matplotlib about pandas objects. Libraries implementing the backend won't have that problem, since they already depend on pandas.

@datapythonista
Copy link
Member Author

Personally I think about it mainly in terms of managing complexity. Having a standard plotting API that is delegated to the backend via a single API is easy to understand, and to maintain. Users and maintainers just need to learn that there is a plot function with a kind argument, and that this will be executed in the selected backend.

Having in the backend a set of heterogeneous plots, that besides not following the same API, use a backend, but not the one selected for the other plots, but the Matplotlib one, adds too much complexity for everyone IMHO.

And the cost of moving them seems small to me, my guess is that not a big proportion of our users even know about those plots. And for the ones who do, they'll just need to install an extra conda package and use import pandas_plotting; pandas_plotting.andrews_curves(df) instead of pandas.plotting.andrews_curves(df).

To me seems a lot to win, at a small cost, but of course it's just an opinion.

@TomAugspurger
Copy link
Contributor

Can we document that the swappable backend is just for Series/DataFrame.plot? That seems like a pretty simple rule.

@datapythonista
Copy link
Member Author

Feels like a hack that adds unnecessary complexity to me; I don't think explaining it in the documentation makes it less counter-intuitive.

But anyway, not a big deal. If that's the preferred option, this is how I'd implement it, at least the increase in code complexity is minimal: #27432

@WillAyd WillAyd mentioned this issue Jul 18, 2019
@WillAyd WillAyd removed this from the 0.25.0 milestone Jul 18, 2019
@jakevdp
Copy link
Contributor

jakevdp commented Jul 19, 2019

Looking more closely at this now: if I understand correctly, the way that the plotting backend will be set is using:

pd.set_option('plotting.backend', 'name_of_module')

My understanding, then, is that if I want to make the following work:

pd.set_option('plotting.backend', 'altair')

then I will need the top-level altair package to define all the functions in https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py. I would prefer not to pollute Altair's top-level namespace with all these additional APIs that are not meant to actually be used by Altair users. In fact, I would prefer for altair's pandas extension to live in a separate package, so it's not tied to the release cadence of Altair itself.

If I understand correctly, this means that there's no way for me to make pd.set_option('plotting.backend', 'altair') work correctly without hard-coding the altair package in pandas the way matplotlib is currently hard-coded, is that correct?

if backend_str == "matplotlib":
backend_str = "pandas.plotting._matplotlib"

If so, I would strongly advise rethinking the means by which this API is exposed in third-party packages.

My suggested solution would be to adopt an entrypoint-based framework that would let me, for example, create a package like altair_pandas that registers the altair entrypoint to implement the API. Otherwise users will forever be confused that pd.set_option('plotting.backend', 'altair') doesn't do what they expect.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 19, 2019 via email

@datapythonista
Copy link
Member Author

There was a point in time where what you say was mostly correct, but that's not the case anymore.

If you want pandas.options.plotting.backend = 'altair', in 0.25 you just need to have a function altair.plot(). At some point I thought would be better to call the function pandas_plot instead of simply plot, so it was specific in a backend that had other things, but we finally didn't make the change.

If creating the plot function in the top level of altair is a problem, we can rename it in a future version, or you can also have altair.pandas.plot, but then users will have to set pandas.options.plotting.backend = 'altair.pandas'.

You can surely change the option yourself once users do an import altair. And we could implement a registry of backends. But I think it'd be confusing for users if they do the pandas.options.plotting.backend = 'altair' and it fails, because they forgot the import altair before.

One last thing is to consider that we could possibly have more than one pandas backend implemented for altair (or any other visualization library). So, for me, that the name of the backend is not altair, is not necessarily a bad thing.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 19, 2019

Here's an entry-points based implementation

diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py
index 0610780ed..c8ac12901 100644
--- a/pandas/plotting/_core.py
+++ b/pandas/plotting/_core.py
@@ -1532,8 +1532,10 @@ class PlotAccessor(PandasObject):
 
         return self(kind="hexbin", x=x, y=y, C=C, **kwargs)
 
+_backends = {}
 
-def _get_plot_backend(backend=None):
+
+def _get_plot_backend(backend="matplotlib"):
     """
     Return the plotting backend to use (e.g. `pandas.plotting._matplotlib`).
 
@@ -1546,7 +1548,14 @@ def _get_plot_backend(backend=None):
     The backend is imported lazily, as matplotlib is a soft dependency, and
     pandas can be used without it being installed.
     """
-    backend_str = backend or pandas.get_option("plotting.backend")
-    if backend_str == "matplotlib":
-        backend_str = "pandas.plotting._matplotlib"
-    return importlib.import_module(backend_str)
+    import pkg_resources  # slow import. Delay
+    if backend in _backends:
+        return _backends[backend]
+
+    for entry_point in pkg_resources.iter_entry_points("pandas_plotting_backends"):
+        _backends[entry_point.name] = entry_point.load()
+
+    try:
+        return _backends[backend]
+    except KeyError:
+        raise ValueError("No backend {}".format(backend))
diff --git a/setup.py b/setup.py
index 53e12da53..d2c6b18b8 100755
--- a/setup.py
+++ b/setup.py
@@ -830,5 +830,10 @@ setup(
             "hypothesis>=3.58",
         ]
     },
+    entry_points={
+        "pandas_plotting_backends": [
+            "matplotlib = pandas:plotting._matplotlib",
+        ],
+    },
     **setuptools_kwargs
 )

I think it's quite nice. 3rd party packages will modify their setup.py (or pyproject.toml) to include something like

entry_points={
    "pandas_plotting_backends": ["altair = pdvega._pandas_plotting_backend"]
}

I like that it breaks the tight coupling between naming and implementation.

@datapythonista
Copy link
Member Author

I didn't work with entry points, are them like a global registry of the Python environment? Being new to them I don't love the idea, but I guess that would be a reasonable way to do it then.

I'd still like to have both options, so if the user does pandas.options.plottting.backend = 'my_own_project.my_custom_small_backend' it works, and doesn't require creating a package, and setting entry points.

@TomAugspurger
Copy link
Contributor

I didn't work with entry points, are them like a global registry of the Python environment?

I haven't used them either, but I think that's the idea. From what I understand, they're from setuptools (but packages like flit hook into them?). So they aren't part of the standard library, but setuptools is what everyone uses anyway.

I'd still like to have both options

Falling back to import_module(backend_name) seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Needs Discussion Requires discussion from core team before further action Visualization plotting
Projects
None yet
Development

No branches or pull requests

9 participants