Support explicitly setting a dimension order with to_dataframe() #4333

Thomas-Z · 2020-08-11T08:46:45Z

Closes Support explicitly setting a dimension order with to_dataframe() #4331
Tests added
Passes isort . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst

max-sixty · 2020-08-11T13:45:33Z

Great, thanks @Thomas-Z , LGTM

Does anyone have thoughts on the kwarg name? dim_order seems reasonable, though IIRC we don't use it elsewhere

dcherian · 2020-08-11T15:42:53Z

How about dims: Iterable[Hashable]?

Thomas-Z · 2020-08-11T15:51:02Z

Hello,

I actually followed @shoyer suggestion to use to_dask_dataframe parameter name.

And I just realized I only did half the work. I'll add this parameter to DataArray.to_dataframe if you validate this name.

dcherian · 2020-08-11T16:10:24Z

OK consistency with to_dask_dataframe is a good idea. Maybe stick with dim_order for now and we can change both later if someone really wants it?

shoyer · 2020-08-11T16:44:23Z

I like dim_order a little bit better than dims because it's clear that you need supply all the dimensions, not just a subset (which is usually the case for dims)

shoyer · 2020-08-11T16:45:53Z

xarray/core/dataset.py

+        if dim_order is None:
+            dim_order = list(self.dims)
+        elif set(dim_order) != set(self.dims):
+            raise ValueError(
+                "dim_order {} does not match the set of dimensions on this "
+                "Dataset: {}".format(dim_order, list(self.dims))
+            )


Maybe we could make a small helper method _normalize_dim_order() that we could use both here and in to_dask_dataframe?

Thomas-Z · 2020-08-11T16:49:19Z

Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)?

shoyer · 2020-08-11T16:53:59Z

Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)?

DataArray.to_dataframe() should keep the current behavior based on the dimension order.

Refactoring some code, fixing some docstring.

dcherian

Thanks @Thomas-Z; just one minor comment. Can you add a note to whats-new.rst

xarray/core/dataarray.py

dcherian · 2020-08-14T18:28:23Z

Thanks @Thomas-Z . I see this is your first PR here. Welcome to xarray!

max-sixty · 2020-08-14T21:03:54Z

Thanks @Thomas-Z ! Great to have you as a contributor

Thomas-Z · 2020-08-15T07:20:53Z

My pleasure.
I've been a user for a few years now, I'll gladly give something back whenever I can.

@mathause

* upstream/master: (34 commits) Fix bug in computing means of cftime.datetime arrays (pydata#4344) fix some str accessor inconsistencies (pydata#4339) pin matplotlib in ci/requirements/doc.yml (pydata#4340) Clarify drop_vars return value. (pydata#4244) Support explicitly setting a dimension order with to_dataframe() (pydata#4333) Increase support window of all dependencies (pydata#4296) Implement interp for interpolating between chunks of data (dask) (pydata#4155) Add @mathause to current core developers. (pydata#4335) install sphinx-autosummary-accessors from conda-forge (pydata#4332) Use sphinx-accessors-autosummary (pydata#4323) ndrolling fixes (pydata#4329) DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327) pin sphinx to 3.1(pydata#4326) nd-rolling (pydata#4219) Implicit dask import 4164 (pydata#4318) allow customizing the inline repr of a duck array (pydata#4248) silence the known docs CI issues (pydata#4316) enh: fixed pydata#4302 (pydata#4315) Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310) Fix map_blocks example (pydata#4305) ...

@mathause

* upstream/master: (40 commits) Fix bug in computing means of cftime.datetime arrays (pydata#4344) fix some str accessor inconsistencies (pydata#4339) pin matplotlib in ci/requirements/doc.yml (pydata#4340) Clarify drop_vars return value. (pydata#4244) Support explicitly setting a dimension order with to_dataframe() (pydata#4333) Increase support window of all dependencies (pydata#4296) Implement interp for interpolating between chunks of data (dask) (pydata#4155) Add @mathause to current core developers. (pydata#4335) install sphinx-autosummary-accessors from conda-forge (pydata#4332) Use sphinx-accessors-autosummary (pydata#4323) ndrolling fixes (pydata#4329) DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327) pin sphinx to 3.1(pydata#4326) nd-rolling (pydata#4219) Implicit dask import 4164 (pydata#4318) allow customizing the inline repr of a duck array (pydata#4248) silence the known docs CI issues (pydata#4316) enh: fixed pydata#4302 (pydata#4315) Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310) Fix map_blocks example (pydata#4305) ...

hammer · 2020-08-19T20:37:37Z

Just noting for GitHub metadata purposes that this PR addresses the Dataset.to_dataframe annotation request in #4238

tzilio added 2 commits August 11, 2020 10:40

#4331: Adding dim_order parameter to Dataset.to_dataframe

cfe4365

#4331: Typo

665c1aa

shoyer reviewed Aug 11, 2020

View reviewed changes

#4331: Adding dim_order parameter to DataArray.to_dataframe.

c5b2c17

Refactoring some code, fixing some docstring.

dcherian reviewed Aug 14, 2020

View reviewed changes

xarray/core/dataarray.py Show resolved Hide resolved

shoyer approved these changes Aug 14, 2020

View reviewed changes

xarray/core/dataarray.py Show resolved Hide resolved

tzilio added 3 commits August 14, 2020 19:21

Merge branch 'master' into to_dataframe_dimensions_order

352636a

#4331: Updating whats-new.rst

9acb3cf

#4331: Updating whats-new.rst (bis)

b24d306

dcherian approved these changes Aug 14, 2020

View reviewed changes

dcherian merged commit 1f45bca into pydata:master Aug 14, 2020

mathause mentioned this pull request Aug 16, 2020

annotate concat #4346

Merged

3 tasks

This was referenced Oct 21, 2024

Dataset.to_dataframe() dimension order is not alphabetically sorted by default #9653

Open

Update to_dataframe doc to match current behavior #9662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support explicitly setting a dimension order with to_dataframe() #4333

Support explicitly setting a dimension order with to_dataframe() #4333

Thomas-Z commented Aug 11, 2020 •

edited

Loading

max-sixty commented Aug 11, 2020

dcherian commented Aug 11, 2020

Thomas-Z commented Aug 11, 2020 •

edited

Loading

dcherian commented Aug 11, 2020

shoyer commented Aug 11, 2020

shoyer Aug 11, 2020

Thomas-Z commented Aug 11, 2020

shoyer commented Aug 11, 2020

dcherian left a comment

dcherian commented Aug 14, 2020

max-sixty commented Aug 14, 2020

Thomas-Z commented Aug 15, 2020

hammer commented Aug 19, 2020

Support explicitly setting a dimension order with to_dataframe() #4333

Support explicitly setting a dimension order with to_dataframe() #4333

Conversation

Thomas-Z commented Aug 11, 2020 • edited Loading

max-sixty commented Aug 11, 2020

dcherian commented Aug 11, 2020

Thomas-Z commented Aug 11, 2020 • edited Loading

dcherian commented Aug 11, 2020

shoyer commented Aug 11, 2020

shoyer Aug 11, 2020

Choose a reason for hiding this comment

Thomas-Z commented Aug 11, 2020

shoyer commented Aug 11, 2020

dcherian left a comment

Choose a reason for hiding this comment

dcherian commented Aug 14, 2020

max-sixty commented Aug 14, 2020

Thomas-Z commented Aug 15, 2020

hammer commented Aug 19, 2020

Thomas-Z commented Aug 11, 2020 •

edited

Loading

Thomas-Z commented Aug 11, 2020 •

edited

Loading