Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve naming and standardize functionality of .reset_index() and .reset_coords() methods #4366

Open
rchurt opened this issue Aug 21, 2020 · 3 comments

Comments

@rchurt
Copy link

rchurt commented Aug 21, 2020

Is your feature request related to a problem? Please describe.
I think that the behavior of the .reset_index() and .reset_coords() methods is a bit non-intuitive, especially when coming to xarray from Pandas.

Describe the solution you'd like
One proposal is this:

  • Change the name of .reset_index() to .index_to_coords(), because it seems like this is what the method really does
  • Change the name of .reset_coords() to .coords_to_vars(), because it seems like this is what the method really does
  • Make a new method called .reset_index() that has the same function as Pandas .reset_index(), namely to make an index of ints from 0 to nrows
  • Make a new method called .reset_coords() that has an analagous function to Pandas .reset_index(), but for coords, namely to make an coordinate of ints from 0 to the length of that coordinate
@shoyer
Copy link
Member

shoyer commented Aug 23, 2020

@rchurt thanks for raising this issue! I agree, the existing methods are poorly named.

@benbovy is going to be working on refactoring indexing over the next months, and this sounds like something that could be worth addressing as part of that.

@benbovy
Copy link
Member

benbovy commented Sep 15, 2021

I agree too that the behavior of those methods is currently quite counter-intuitive in some cases.

There's indeed an opportunity to address that with the indexes refactor, see for example some related suggestions in those comments: #4825 (comment) and #5692 (comment).

It's not easy to find good names and clear API, though. More ideas like yours @rchurt are welcome.


Some comments on your proposal:

Change the name of .reset_coords() to .coords_to_vars(), because it seems like this is what the method really does

Agreed, .coords_to_vars() is a better name than .reset_coords(). EDIT: .coords_to_data_vars() would be even more meaningful although a bit long.

Change the name of .reset_index() to .index_to_coords(), because it seems like this is what the method really does

.index_to_coords() is not more accurate: contrary to the "data variable" vs "coordinate" concepts, the "index" vs. "coordinate" concepts are not mutually exclusive in xarray's data model (with the index refactor the two latter concepts become fully orthogonal).

Make a new method called .reset_index() that has the same function as Pandas .reset_index(), namely to make an index of ints from 0 to nrows

Pandas' .reset_index() is actually a little bit more complex than that, and xarray's version is quite close to pandas' version regarding how multi-indexes are handled.

Resetting to a new index of incremental integers might be useful but this means replacing the existing coordinate labels, which is not very intuitive for a method called .reset_index(). Similarly to pandas, we could keep the old coordinate as a data variable but we would also have to rename it to avoid name conflicts... in the end it's a lot of implicit and unclear logic.

Make a new method called .reset_coords() that has an analagous function to Pandas .reset_index(), but for coords, namely to make an coordinate of ints from 0 to the length of that coordinate

I'm not sure about the advantages of having a specific method over doing something like:

ds["x"] = np.arange(ds.x.size)

If we eventually move away from treating pandas (multi-)indexes as a special case in xarray, another suggestion would be to intentionally depart from pandas' API and have something like:

  • a .drop_indexes(coord_names, drop_coords=False) method that removes the index(es) assigned to one or more coordinates (and optionally removes all coordinates for which those indexes are built those coordinates)
  • a .reset_indexes() method (with no argument) that would reset all indexes in a DataArray /Dataset to the default indexes (i.e., a pandas.Index for each 1-d coordinates with a name matching their dimension name) as if it was newly created or freshly loaded from a store.

@shoyer
Copy link
Member

shoyer commented Sep 15, 2021

a .drop_indexes(coord_names, drop_coords=False) method that removes the index(es) assigned to one or more coordinates (and optionally removes all coordinates for which those indexes are built those coordinates)

I am inclined to deprecate reset_index() in xarray entirely, in favor of something like drop_indexes. I don't think we need a drop_coords=True option -- that is subsumed by the drop_vars() method, which should also drop associated indexes.

I agree that set_coords() and reset_coords() are somewhat confusingly named, but they do serve an important purpose: converting coordinates into data variables and vise-versa. These methods could be perhaps renamed but that's all I would change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants