Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support explicitly setting a dimension order with to_dataframe() #4333

Merged
merged 6 commits into from
Aug 14, 2020
Merged

Support explicitly setting a dimension order with to_dataframe() #4333

merged 6 commits into from
Aug 14, 2020

Conversation

Thomas-Z
Copy link
Contributor

@Thomas-Z Thomas-Z commented Aug 11, 2020

@max-sixty
Copy link
Collaborator

Great, thanks @Thomas-Z , LGTM

Does anyone have thoughts on the kwarg name? dim_order seems reasonable, though IIRC we don't use it elsewhere

@dcherian
Copy link
Contributor

How about dims: Iterable[Hashable]?

@Thomas-Z
Copy link
Contributor Author

Thomas-Z commented Aug 11, 2020

Hello,

I actually followed @shoyer suggestion to use to_dask_dataframe parameter name.

And I just realized I only did half the work. I'll add this parameter to DataArray.to_dataframe if you validate this name.

@dcherian
Copy link
Contributor

OK consistency with to_dask_dataframe is a good idea. Maybe stick with dim_order for now and we can change both later if someone really wants it?

@shoyer
Copy link
Member

shoyer commented Aug 11, 2020

I like dim_order a little bit better than dims because it's clear that you need supply all the dimensions, not just a subset (which is usually the case for dims)

Comment on lines 4558 to 4564
if dim_order is None:
dim_order = list(self.dims)
elif set(dim_order) != set(self.dims):
raise ValueError(
"dim_order {} does not match the set of dimensions on this "
"Dataset: {}".format(dim_order, list(self.dims))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could make a small helper method _normalize_dim_order() that we could use both here and in to_dask_dataframe?

@Thomas-Z
Copy link
Contributor Author

Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)?

@shoyer
Copy link
Member

shoyer commented Aug 11, 2020

Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)?

DataArray.to_dataframe() should keep the current behavior based on the dimension order.

Refactoring some code, fixing some docstring.
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Thomas-Z; just one minor comment. Can you add a note to whats-new.rst

xarray/core/dataarray.py Show resolved Hide resolved
xarray/core/dataarray.py Show resolved Hide resolved
@dcherian
Copy link
Contributor

Thanks @Thomas-Z . I see this is your first PR here. Welcome to xarray!

@dcherian dcherian merged commit 1f45bca into pydata:master Aug 14, 2020
@max-sixty
Copy link
Collaborator

Thanks @Thomas-Z ! Great to have you as a contributor

@Thomas-Z
Copy link
Contributor Author

My pleasure.
I've been a user for a few years now, I'll gladly give something back whenever I can.

dcherian added a commit to dcherian/xarray that referenced this pull request Aug 15, 2020
* upstream/master: (34 commits)
  Fix bug in computing means of cftime.datetime arrays (pydata#4344)
  fix some str accessor inconsistencies (pydata#4339)
  pin matplotlib in ci/requirements/doc.yml (pydata#4340)
  Clarify drop_vars return value. (pydata#4244)
  Support explicitly setting a dimension order with to_dataframe() (pydata#4333)
  Increase support window of all dependencies (pydata#4296)
  Implement interp for interpolating between chunks of data (dask) (pydata#4155)
  Add @mathause to current core developers. (pydata#4335)
  install sphinx-autosummary-accessors from conda-forge (pydata#4332)
  Use sphinx-accessors-autosummary (pydata#4323)
  ndrolling fixes (pydata#4329)
  DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327)
  pin sphinx to 3.1(pydata#4326)
  nd-rolling (pydata#4219)
  Implicit dask import 4164 (pydata#4318)
  allow customizing the inline repr of a duck array (pydata#4248)
  silence the known docs CI issues (pydata#4316)
  enh: fixed pydata#4302 (pydata#4315)
  Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310)
  Fix map_blocks example (pydata#4305)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 16, 2020
* upstream/master: (40 commits)
  Fix bug in computing means of cftime.datetime arrays (pydata#4344)
  fix some str accessor inconsistencies (pydata#4339)
  pin matplotlib in ci/requirements/doc.yml (pydata#4340)
  Clarify drop_vars return value. (pydata#4244)
  Support explicitly setting a dimension order with to_dataframe() (pydata#4333)
  Increase support window of all dependencies (pydata#4296)
  Implement interp for interpolating between chunks of data (dask) (pydata#4155)
  Add @mathause to current core developers. (pydata#4335)
  install sphinx-autosummary-accessors from conda-forge (pydata#4332)
  Use sphinx-accessors-autosummary (pydata#4323)
  ndrolling fixes (pydata#4329)
  DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327)
  pin sphinx to 3.1(pydata#4326)
  nd-rolling (pydata#4219)
  Implicit dask import 4164 (pydata#4318)
  allow customizing the inline repr of a duck array (pydata#4248)
  silence the known docs CI issues (pydata#4316)
  enh: fixed pydata#4302 (pydata#4315)
  Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310)
  Fix map_blocks example (pydata#4305)
  ...
@mathause mathause mentioned this pull request Aug 16, 2020
3 tasks
@hammer
Copy link

hammer commented Aug 19, 2020

Just noting for GitHub metadata purposes that this PR addresses the Dataset.to_dataframe annotation request in #4238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support explicitly setting a dimension order with to_dataframe()
5 participants