Implement idxmax and idxmin functions #3871

toddrjen · 2020-03-20T04:27:32Z

This implements idxmax and idxmin functions similar to thier pandas equivalents.

This is my first time contributing to the project so I am not certain the structure or approach is the best. Please let me know if there is a better way to implement this.

This also includes two other changes.

First, it drops some code for backwards-compatibility with numpy 1.12, which isn't supported. This code was hiding an error I needed to have access to in order to get the function working.

Second, it adds an option to Dataset.map to let you map DataArray methods by name. I used this to implement the Dataset versions of idxmax and idxmin.

Closes Implement DataArray.idxmax() #60
Tests added
Passes isort -rc . && black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

max-sixty · 2020-03-20T17:12:46Z

This looks extremely good and thorough! Thanks @toddrjen !

I'll have a proper look through later. I see a couple of minor questions I'll add in too. Others feel free to get ahead of me!

keewis

I agree with @max-sixty: this looks really good.

I will also have a proper look later, but for now I do think you don't have to change map to accept method names: we can pass callables that do the lookup for us or use unbound methods (something like cls.method instead of obj.method). I'd prefer unbound methods.

xarray/core/dataarray.py

xarray/core/dataset.py

max-sixty

Added a couple of nits. Overall this does look very good, and thanks for adding lots of tests including those outside these functions.

To what extent should this support non-index coordinates?

Any other thoughts?

xarray/core/dataarray.py

xarray/core/dataset.py

toddrjen · 2020-03-20T23:06:45Z

@keewis @max-sixty The map thing is purely a convenience function. I know there are other ways to do it, but since I thought this would be a useful feature for users in its own right, I did it that way. But of course I can do it another way if you disagree.

The one complication is that using DataArray.idxmax and DataArray.idxmin assumes that the Dataset would only ever contain DataArray objects. That may be mostly the case now, but I didn't want to bake that into the code. I could do it using a lambda or nested function, but as I said I thought this approach had other benefits to users.

I will address the other comments inline.

toddrjen · 2020-03-21T02:16:27Z

@keewis @max-sixty The new commit with the requested changes has been pushed to this branch (except for the map one, pending ongoing discussion). Please take a look.

toddrjen · 2020-03-21T02:41:23Z

@max-sixty

To what extent should this support non-index coordinates?

I am not familiar with non-index coordinates, what are those?

Do you mean non-dimension coordinates? Does that even make sense in a general way? If they are 1D and tied to just one dimension coordinate that could be done, but if they are not tied to any dimension or tied to multiple dimensions or otherwise not 1D I am not sure what it would mean to take the idxmin/idxmax of them.

xarray/core/dataarray.py

max-sixty · 2020-03-21T03:57:00Z

The one complication is that using DataArray.idxmax and DataArray.idxmin assumes that the Dataset would only ever contain DataArray objects

I hear you and share the impulse that baking this in seems not ideal. Though I think it's a reasonable compromise to make, and there are no plans to deviate from it.

(Ideally maybe we have a ._contained_class attribute on the dataset which is almost always DataArray)

I think having a lambda is fine too.

max-sixty · 2020-03-21T04:13:32Z

Do you mean non-dimension coordinates?

Yes, thanks for clarifying

Does that even make sense in a general way? If they are 1D and tied to just one dimension coordinate that could be done, but if they are not tied to any dimension or tied to multiple dimensions or otherwise not 1D I am not sure what it would mean to take the idxmin/idxmax of them.

Yes, it wouldn't work in all cases, fair point. There are some cases in which it would work though, I'm unsure if it would be too complicated an interface to return them depending on whether it would work.

(and completely fine to contemplate these later)

max-sixty

🚀

xarray/core/dataset.py

xarray/core/dataarray.py

toddrjen · 2020-03-22T01:45:05Z

I fixed the extra space in the docstring and moved the business logic to computation.py.

max-sixty · 2020-03-22T02:32:53Z

LGTM! Any other thoughts before we merge?

xarray/core/computation.py

xarray/core/dataarray.py

xarray/core/dataset.py

xarray/core/computation.py

shoyer · 2020-03-22T22:37:40Z

In general this is really nicely put together. My main asks:

Remove the unrelated API change to map
Think about if the alternative of returning an arbitrary value rather than promotion or raising an error.

For details see the comments above. If we want to support (2), then it might make sense to use a string for selecting values of promote rather than True/False/None (e.g., so we can include the option to return an arbitrary coordinate value).

xarray/core/computation.py

toddrjen · 2020-03-25T15:19:01Z

That could work. The corner case we would need to decide on is again promotion. What happens if the fill value is a "higher" type in the numeric tower than the original type? What if it is lower? 1. We could try to always convert to the fill dtype (or more often the dtype equivalent to the Python native type), and raise and exception of it doesn't work. 2. We could promote the fill value or original data, whichever is "lower". What if someone tries to use a string type for numeric data or vice versus? If we do option 1 that is easy. Otherwise we probably need to use numpy casting rules? What about an object dtype fill value? What about a date/time regard dtype?

…

On Mon, Mar 23, 2020, 23:49 Stephan Hoyer ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In xarray/core/dataset.py <#3871 (comment)>: > @@ -5914,5 +5921,169 @@ def pad( return self._replace_vars_and_dims(variables) + def idxmin( + self, + dim: Hashable = None, + axis: int = None, + skipna: bool = None, + promote: bool = None, Just to throw out another API option: what about having a fill_value argument instead of promote? The default (fill_value=dtypes.NA) would do type promotion for integer dtypes and always fill with NA. Other values (e.g., fill_value=0) could be used to avoid type promotion with an integer coordinate. Advantages: - No special cases to keep track of. - Consistent with other xarray methods that take a fill_value argument. Disadvantages: - No built-in way to raise an error instead of promotion (but users could do this themselves pretty easily) - No built-in way to "only promote if necessary" (but this is a weird non-type stable API that doesn't work great with Dask, anyways) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3871 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARK43Q3GISCTFURNDUDQNDRJAUTJANCNFSM4LP7WEMA> .

toddrjen · 2020-03-26T04:17:55Z

Please see the newest version with the promote argument changed to fill_value.

toddrjen · 2020-03-26T04:46:07Z

I am not sure why the tests are suddenly failing. The tests were all working, then I rebased on the latest master and they are failing and I can't figure out why.

toddrjen · 2020-03-26T14:28:04Z

I figured out what is going wrong. I will make a commit with a fix and include it in this pull request later today.

xarray/core/dataarray.py

toddrjen · 2020-03-27T03:41:56Z

I think I have implemented all the requested changes and all tests are passing. Please take a look.

xarray/core/dataarray.py

xarray/core/duck_array_ops.py

xarray/core/dataarray.py

xarray/core/computation.py

xarray/core/dataarray.py

xarray/core/computation.py

toddrjen · 2020-03-28T01:32:00Z

Here is a new commit with the discussed changes.

max-sixty · 2020-03-28T03:18:53Z

Thank you again @toddrjen, both for the content and the iteration.

@shoyer any final thoughts? Otherwise I'll merge tomorrow?

shoyer

Looks good to me, thanks!

dcherian

just some minor doc suggestions.

doc/whats-new.rst

xarray/core/dataarray.py

xarray/core/dataset.py

max-sixty · 2020-03-29T00:55:43Z

Amazing — let's merge on green!

toddrjen · 2020-03-29T01:18:37Z

I have gone over it one more time and made a few documentation fixes. Please take one more look before merging.

dcherian · 2020-03-29T01:58:24Z

Thanks @toddrjen that's a great contribution!

max-sixty · 2020-03-29T06:02:19Z

@toddrjen thank you again! This ended up being quite an adventure, really appreciate you pushing all the way.

* upstream/master: (75 commits) Implement idxmax and idxmin functions (pydata#3871) Update pre-commit-config.yaml (pydata#3911) Revert "Use `fixes` in PR template (pydata#3886)" (pydata#3912) update the docstring of diff (pydata#3909) Un-xfail test_dayofyear_after_cftime_range (pydata#3907) Limit repr of arrays containing long strings (pydata#3900) expose a few zarr backend functions as semi-public api (pydata#3897) Use drawstyle instead of linestyle in plot.step. (pydata#3274) Implementation of polyfit and polyval (pydata#3733) misplaced quote in whatsnew (pydata#3889) Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887) Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877) xfail test_uamiv_format_write (pydata#3885) Use `fixes` in PR template (pydata#3886) Tweaks to "how_to_release" (pydata#3882) whatsnew section for 0.16.0 Release v0.15.1 whatsnew for 0.15.1 (pydata#3879) update panel documentation (pydata#3880) reword the whats-new entry for unit support (pydata#3878) ...

kmuehlbauer · 2020-03-31T11:02:48Z

xarray/core/computation.py

+
+    # Handle dask arrays.
+    if isinstance(array, dask_array_type):
+        res = dask_array.map_blocks(coordarray, indx, dtype=indx.dtype)


@toddrjen @dcherian

Sorry, I might be wrong, but it seems, that the func is missing as argument to map_blocks. I tried with lambda a, b: a[b] which seems to work.

Two more things:

isinstance-check need to use array.data.

res need to be computed, otherwise subsequent actions with res will fail.

You’re right, this is definitely broken. Anyone up for putting together a fix in a follow up PR?

@shoyer See #3922

toddrjen force-pushed the idxmax branch from e811c69 to 925b5c8 Compare March 20, 2020 16:40

keewis reviewed Mar 20, 2020

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

xarray/core/dataset.py Outdated Show resolved Hide resolved

max-sixty reviewed Mar 20, 2020

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

xarray/core/dataarray.py Outdated Show resolved Hide resolved

xarray/core/dataarray.py Show resolved Hide resolved

xarray/core/dataset.py Outdated Show resolved Hide resolved

max-sixty reviewed Mar 21, 2020

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

max-sixty approved these changes Mar 21, 2020

View reviewed changes

xarray/core/dataset.py Outdated Show resolved Hide resolved

xarray/core/dataset.py Outdated Show resolved Hide resolved

xarray/core/dataset.py Outdated Show resolved Hide resolved

xarray/core/dataarray.py Outdated Show resolved Hide resolved

dcherian mentioned this pull request Mar 21, 2020

0.15.1 release #3869

Closed

13 tasks

shoyer reviewed Mar 22, 2020

View reviewed changes

xarray/core/computation.py Show resolved Hide resolved

xarray/core/computation.py Outdated Show resolved Hide resolved

shoyer reviewed Mar 22, 2020

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

toddrjen force-pushed the idxmax branch from b891f94 to 6e884aa Compare March 24, 2020 03:06

shoyer mentioned this pull request Mar 26, 2020

Allow for All-NaN in argmax, argmin #3884

Open

toddrjen force-pushed the idxmax branch 2 times, most recently from 37173ec to 272a690 Compare March 26, 2020 04:17

max-sixty reviewed Mar 26, 2020

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

shoyer reviewed Mar 26, 2020

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

toddrjen force-pushed the idxmax branch 2 times, most recently from 51e4926 to b7b8f6b Compare March 27, 2020 03:04

toddrjen mentioned this pull request Mar 27, 2020

_indexes of DataArray are not deep copied #3899

Closed

max-sixty reviewed Mar 27, 2020

View reviewed changes

xarray/core/dataarray.py Show resolved Hide resolved

shoyer reviewed Mar 27, 2020

View reviewed changes

xarray/core/duck_array_ops.py Show resolved Hide resolved

xarray/core/dataarray.py Show resolved Hide resolved

xarray/core/computation.py Outdated Show resolved Hide resolved

xarray/core/dataarray.py Outdated Show resolved Hide resolved

xarray/core/computation.py Show resolved Hide resolved

shoyer approved these changes Mar 28, 2020

View reviewed changes

dcherian reviewed Mar 28, 2020

View reviewed changes

toddrjen added 2 commits March 28, 2020 20:38

drop numpy 1.12 compat code that can hide other errors

f65582a

deep copy _indexes (pydata#3899)

70e628d

toddrjen force-pushed the idxmax branch from fbddd3e to 7d65502 Compare March 29, 2020 00:42

implement idxmax and idxmin

c296801

toddrjen force-pushed the idxmax branch from 7d65502 to c296801 Compare March 29, 2020 01:18

max-sixty merged commit 1416d5a into pydata:master Mar 29, 2020

toddrjen deleted the idxmax branch March 29, 2020 02:00

kmuehlbauer reviewed Mar 31, 2020

View reviewed changes

This was referenced Apr 4, 2020

Argmin indexes #1469

Closed

Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods #3936

Merged

kmuehlbauer mentioned this pull request Apr 6, 2020

FIX: correct dask array handling in _calc_idxminmax #3922

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement idxmax and idxmin functions #3871

Implement idxmax and idxmin functions #3871

toddrjen commented Mar 20, 2020 •

edited

Loading

max-sixty commented Mar 20, 2020

keewis left a comment

max-sixty left a comment

toddrjen commented Mar 20, 2020

toddrjen commented Mar 21, 2020

toddrjen commented Mar 21, 2020

max-sixty commented Mar 21, 2020

max-sixty commented Mar 21, 2020

max-sixty left a comment

toddrjen commented Mar 22, 2020

max-sixty commented Mar 22, 2020

shoyer commented Mar 22, 2020

toddrjen commented Mar 25, 2020 via email

toddrjen commented Mar 26, 2020

toddrjen commented Mar 26, 2020

toddrjen commented Mar 26, 2020

toddrjen commented Mar 27, 2020

toddrjen commented Mar 28, 2020

max-sixty commented Mar 28, 2020

shoyer left a comment

dcherian left a comment

max-sixty commented Mar 29, 2020

toddrjen commented Mar 29, 2020

dcherian commented Mar 29, 2020

max-sixty commented Mar 29, 2020

kmuehlbauer Mar 31, 2020

kmuehlbauer Mar 31, 2020

shoyer Mar 31, 2020

kmuehlbauer Mar 31, 2020

Implement idxmax and idxmin functions #3871

Implement idxmax and idxmin functions #3871

Conversation

toddrjen commented Mar 20, 2020 • edited Loading

max-sixty commented Mar 20, 2020

keewis left a comment

Choose a reason for hiding this comment

max-sixty left a comment

Choose a reason for hiding this comment

toddrjen commented Mar 20, 2020

toddrjen commented Mar 21, 2020

toddrjen commented Mar 21, 2020

max-sixty commented Mar 21, 2020

max-sixty commented Mar 21, 2020

max-sixty left a comment

Choose a reason for hiding this comment

toddrjen commented Mar 22, 2020

max-sixty commented Mar 22, 2020

shoyer commented Mar 22, 2020

toddrjen commented Mar 25, 2020 via email

toddrjen commented Mar 26, 2020

toddrjen commented Mar 26, 2020

toddrjen commented Mar 26, 2020

toddrjen commented Mar 27, 2020

toddrjen commented Mar 28, 2020

max-sixty commented Mar 28, 2020

shoyer left a comment

Choose a reason for hiding this comment

dcherian left a comment

Choose a reason for hiding this comment

max-sixty commented Mar 29, 2020

toddrjen commented Mar 29, 2020

dcherian commented Mar 29, 2020

max-sixty commented Mar 29, 2020

kmuehlbauer Mar 31, 2020

Choose a reason for hiding this comment

kmuehlbauer Mar 31, 2020

Choose a reason for hiding this comment

shoyer Mar 31, 2020

Choose a reason for hiding this comment

kmuehlbauer Mar 31, 2020

Choose a reason for hiding this comment

toddrjen commented Mar 20, 2020 •

edited

Loading