Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make plotly-express dataframe agnostic via narwhals #4790

Merged
merged 131 commits into from
Nov 13, 2024

Conversation

FBruzzesi
Copy link
Contributor

@FBruzzesi FBruzzesi commented Oct 9, 2024

Description

This PR migrates plotly-express module logic from pandas to narwhals. In this way, pandas is not a required dependency for plotly-express (or at least for its entirety - e.g. trendlines will still require pandas for now) and users coming with polars,
pyarrow or other eager dataframes supported in narwhals do not need to depend on pandas in the first place.

Closes #4749

Code PR

  • I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.
  • For a new feature, I have added documentation examples in an existing or
    new tutorial notebook (please see the doc checklist as well).
  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.
  • For a new feature or a change in behaviour, I have updated the relevant docstrings in the code to describe the feature or behaviour (please see the doc checklist as well).

Out of scope for the PR

  • No documentation change has been done so far
  • Adapt plotly data accordingly included in this PR
  • timeseries/trendlines

cc: @MarcoGorelli @LiamConnors

@@ -454,7 +532,7 @@ def make_trace_kwargs(args, trace_spec, trace_data, mapping_labels, sizeref):
mapping = args["color_discrete_map"].copy()
else:
mapping = {}
for cat in trace_data[attr_value]:
for cat in trace_data.get_column(attr_value).to_list():
Copy link
Contributor Author

@FBruzzesi FBruzzesi Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, although trace_data.get_column(attr_value) is a Narwhals Series, which is an iterable, explicitly calling a to_list() makes sure that the elements we loop over are python objects in all cases.
Calling to_py_scalar for each element is significantly more time consuming for the cases in which elements are not python scalars (such as pyarrow).

I will make sure to add an in-line comment

@LiamConnors
Copy link
Member

LiamConnors commented Nov 6, 2024

This is an example of a graph that renders differently each time in polars. Looks like the order for the values of x, color and facet_col change.

import plotly.express as px
df = px.data.tips(return_type="polars")
fig = px.bar(df, x="day", y="total_bill", color="smoker", barmode="group", facet_col="sex",
            )
fig.show()

But for pandas the order is consistent and seems to depend on the order of the data. Would it be possible to make it consistent for polars too. cc @emilykl

@MarcoGorelli
Copy link
Contributor

But for pandas the order is consistent and seems to depend on the order of the data. Would it be possible to make it consistent for polars too. cc @emilykl

yup, addressed in the latest commit 👌

latest timings: https://www.kaggle.com/code/marcogorelli/visualise-timings?scriptVersionId=205986199

Copy link
Contributor

@emilykl emilykl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FBruzzesi @MarcoGorelli All looks good from our side. You can go ahead and merge when ready.

This was a massive effort, thank you very much for the contribution! 🚀

@FBruzzesi
Copy link
Contributor Author

FBruzzesi commented Nov 13, 2024

Hey @emilykl thanks a ton! We really appreciated the collaborative effort! As mention on a few meetings, it led us to investigate deeper and improve narwhals codebase significantly as well, it has been such a win-win!

Regarding merging, the branch keeps going out of sync with master, and I will need another approval to merge it myself (and hopefully I need to time it correctly 😇) or you can approve and merge whenever you want. There are no other changes on this PR from our side 🚀

@emilykl emilykl merged commit 5898816 into plotly:master Nov 13, 2024
4 checks passed
@emilykl
Copy link
Contributor

emilykl commented Nov 13, 2024

@FBruzzesi @MarcoGorelli Merged! 🚀 🎉 💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community community contribution feature something new P1 needed for current cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore using Narwhals in Plotly Express
7 participants