Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset line plot #4820

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jan 17, 2021

  • Closes Dataset plot line #4235
  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

TODO:

  • markersize/linewidth should work, easy if linewidth is only defined. But the hue/linewidth combo is trickier.
    • Possible, see post further down for details. Should be done in dataarray version in a future PR.
  • hue only works with coordinates. Should it work with variables? How to do that? groupby?
    • Leave it for coordinates only for now.
  • Should lines always be sorted to avoid weird lines?
    • Handled in dataarray version
  • linewidth not shown anywhere
    • line plots doesn't have the nice legend_elements function that scatter has which makes it a little bit more tricky. It can still be shown in the legend though.

Adds ds.plot.line. The function wraps ax.plot which should feel pretty homely for most matplotlib users.

It was a mess getting the colorbar to work because scatter returns a mappable that colorbar can handle but plot returns a list of line2ds that it can't handle. You also can't add cmaps directly to ax.plot, so to avoid for loops I changed the default color settings instead.

Feel free to break it.

Example:
bild

@mathause mathause marked this pull request as draft January 22, 2021 00:11
@mathause
Copy link
Collaborator

This looks like a good start but is also still very preliminary so I converted to a "work in progress".

@Illviljan
Copy link
Contributor Author

I've added a little todo list of things I've noticed, @mathause. Did you have something else in mind?

I wouldn't mind some more examples and matplotlib solutions if anyone has done any nice looking line plots with xarray before.

@mathause
Copy link
Collaborator

Not really. It just looked very drafty ;-)

My inclination would be to not allow setting lw and ms via a variable or coordinates - you should then probably use a scatterplot. That would also take care of your 4th point.

hue only works with coordinates.

I think that's fine. It's always possible to extend later.

@dcherian
Copy link
Contributor

I wonder if it would be easier to construct a DataArray and then use DataArray.plot.line so everything is going through the same code path

linewidth and hue doesn't work together though, separate uses only.
@Illviljan
Copy link
Contributor Author

Added a variant using dataarray, it's much cleaner although the wrappers had to be redone.
I think we can move ds.plot.scatter to the dataarray side as well.

@Illviljan
Copy link
Contributor Author

Illviljan commented Feb 8, 2021

I think making the dataset plots a thin wrapper for the dataarray plot functions is the way to go. I think most things for the lineplot works as intended now.
I'm not sure about the decorator though. It's a bit too simple compared to _dsplot at the moment.

Supporting linewidth in plots should be possible but this is for a future pr I think. I did a small proof of concept here:

def _lineplot(ds, x, y, hue, linewidth):
    if len(ds[y].dims) > 3:
        raise NotImplementedError("too many dims.")

    fig, ax = plt.subplots(1, 1)
    for i, width in enumerate(ds[linewidth]):
        # Filter along linewidth:
        dsi = ds.isel(**{linewidth: i})

        # Values to plot:
        xplt = dsi[x]

        # if xplt has no dims:
        (xdim,) = xplt.dims
        (huedim,) = dsi[hue].dims
        yplt = dsi[y].transpose(..., xdim, huedim)

        # Set plot properties:
        len_lines = len(ds[hue])
        cmap = plt.get_cmap("viridis", len_lines)
        colors = plt.cycler(color=cmap(np.arange(len_lines)))
        lw = plt.cycler(lw=[1 + i * 3])
        ax.set_prop_cycle(colors * lw)

        # p = xr.broadcast(xplt, yplt)
        plt.plot(xplt, yplt, label=width.values)
        plt.legend()

    # ax.plot doesn't return a mappable that fig.colorbar can parse. Create
    # one and return that one instead:
    norm = plt.Normalize(vmin=ds[hue].min(), vmax=ds[hue].max())
    primitive = plt.cm.ScalarMappable(cmap=plt.get_cmap("viridis"), norm=norm)
    fig.colorbar(mappable=primitive)

    return fig, ax

ds = xr.tutorial.scatter_example_dataset()
ds1 = ds.sel(z=0)
_lineplot(ds=ds1, x="y", y="A", hue="w", linewidth="x")

@Illviljan Illviljan closed this Feb 11, 2021
@Illviljan Illviljan reopened this Feb 11, 2021
@Illviljan Illviljan closed this Feb 11, 2021
@Illviljan Illviljan reopened this Feb 11, 2021
@Illviljan Illviljan closed this Feb 12, 2021
@Illviljan Illviljan reopened this Feb 12, 2021
@keewis
Copy link
Collaborator

keewis commented Feb 12, 2021

the reason for the docs failure is a intersphinx issue. We fixed that some time ago so you should be able to merge in master to get the build to pass.

@pep8speaks
Copy link

pep8speaks commented Feb 13, 2021

Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-07-24 22:33:10 UTC

@Illviljan Illviljan marked this pull request as ready for review February 13, 2021 10:18
@Illviljan
Copy link
Contributor Author

I think this is ready now.

I'm not super happy with the docs, I think copying makes sense but it doesn't match completely at the moment. I was thinking that can be solved by rewriting the dataarray version in a smarter way.

Comment on lines +470 to +475
plotfunc.__doc__ = (
f" {plotfunc.__doc__}\n\n"
" The y DataArray will be used as base,"
" any other variables are added as coords.\n\n"
f"{commondoc}"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an explanation here how the dataset becomes a dataarray should help making sense of the autogenerated docs I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the _dsplot decorator not work for this case? It would help minimize the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the line plot comes from the dataarray side you don't have to do all the additional things _dsplot is doing. All that's really needed then is to attach to the plot class and then do this tweak to the docs so that they make more sense.

My end goal here is to move all the dataset specific plots to the dataarray side and use this thin wrapper for dataset plotting. Back when I started this PR in january there was only scatter so it made more sense back then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Illviljan Can you add this to the docstring of _attach_to_plot_class

Let's also see what @mathause thinks...

@dcherian dcherian requested a review from mathause May 13, 2021 20:16
@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2021

Unit Test Results

         6 files           6 suites   50m 26s ⏱️
16 200 tests 14 465 ✔️ 1 735 💤 0 ❌
90 396 runs  82 221 ✔️ 8 175 💤 0 ❌

Results for commit 1dfad9f.

♻️ This comment has been updated with latest results.

@Illviljan Illviljan marked this pull request as draft July 25, 2021 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset plot line
5 participants