Argmin indexes #1469

fujiisoup · 2017-07-01T01:23:31Z

Closes argmin / argmax behavior doesn't match documentation #1388
Tests added / passed
Passes git diff master | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

With this PR, ValueError raises if argmin() is called by a multi-dimensional array.

argmin_indexes() method is also added for xr.DataArray.
Current API design for argmin_indexes() returns
the argmin-indexes as an OrderedDict of DataArrays.

Example:

In [1]: import xarray as xr
   ...: da = xr.DataArray([[1, 2], [-1, 40], [5, 6]],
   ...:                [('x', ['c', 'b', 'a']), ('y', [1, 0])])
   ...: 
   ...: da.argmin_indexes()
   ...: 
Out[1]: 
OrderedDict([('x', <xarray.DataArray 'x' ()>
              array(1)), ('y', <xarray.DataArray 'y' ()>
              array(0))])

In [2]: da.argmin_indexes(dims='y')
Out[2]: 
OrderedDict([('y', <xarray.DataArray 'y' (x: 3)>
              array([0, 0, 0])
              Coordinates:
                * x        (x) <U1 'c' 'b' 'a')])

(Because the returned object is an OrderedDict, it is not beautifully printed. The returned type can be a xr.Dataset if we want.)

Although in #1388 argmin_indexes() was originally suggested so that we can pass the result into isel_point,

da.isel_points(**da.argmin_indexes())

current implementation of isel_points does NOT work for this case.

This is mainly because

isel_points currently does not work for 0-dimensional or multi-dimensional input.
Even for 1-dimensional input (the second one in the above examples), we should also pass x as an indexer rather than the coordinate of indexer.

For 1, I have prepared modification of isel_points to accept multi-dimensional arrays, but I guess it should be in another PR after the API decision.
(It is related in #475, and #974.)

For 2, we should either

change API of argmin_indexes to return not only the indicated dimension but also all the dimensions, like

In [2]: da.argmin_indexes(dims='y')
Out[2]: 
OrderedDict([('y', array([0, 0, 0]),
              'x', array(['c' 'b' 'a']))

or

change API of isel_point so that it takes care of the indexer's coordinate if xr.DataArray is passed for as indexers.

I originally worked with the second option for the modification of isel_points,
the second option breaks the backward-comaptibility and is somehow magical.

Another alternertive is to

change API of argmin_indexes to return xr.Dataset rather than an OrderedDict, and also change API of isel_points to accept xr.Dataset.
It keeps backward-compatibility.

Any comments are welcome.

shoyer · 2017-07-03T22:49:39Z

A few quick thoughts on API design:

The most similar pandas method is called idxmin. We may not want to use the exact same name here, but it's something to keep in mind.
We might want two separate methods, one like this that returns an OrderedDict/Dataset and another that returns just one DataArray (for use when reducing over only one axis). I might pick idxmin and indexes_min.
A keep_dims=True argument like numpy is a nice way to preserve dimensions if desired.
I'm a little surprised that it doesn't work to unpack a Dataset with ** in isel_points -- in theory I think it should.

Added idxmin

fujiisoup · 2017-07-04T15:46:20Z

@shoyer
Thanks for the comments.

APIs
Sounds reasonable suggestions.
I renamed argmin_indexs -> indexes_min.
Also I added idxmin that works as similar to pandas's idxmin.
The APIs are

def indexes_min(self, dims=None, skipna=True):

and

def idxmin(self, dim=None, skipna=True, keep_dims=False):

(I found keep_dims for indexes_min brings another confusion and omit from them).

isel_points's issue
Currently, indexes_min works as

In [1]: import xarray as xr
   ...: da = xr.DataArray([[1, 2], [51, 40], [5, 6]],
   ...:                   [('x', ['c', 'b', 'a']), ('y', [1.4, 4.3])])
   ...: da
   ...: 
Out[1]: 
<xarray.DataArray (x: 3, y: 2)>
array([[ 1,  2],
       [51, 40],
       [ 5,  6]])
Coordinates:
  * x        (x) <U1 'c' 'b' 'a'
  * y        (y) float64 1.4 4.3

In [2]: da.indexes_min(dims='y')
Out[2]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) <U1 'c' 'b' 'a'
Data variables:
    y        (x) float64 0 1 0

However, arguments of isel_points should be like

<xarray.Dataset>
Dimensions:  (points: 3)
Data variables:
    x        (points) int64 0 1 2
    y        (points) int64 0 1 0

This behavior is more intuitive?

shoyer · 2017-07-05T05:21:45Z

OK, I think I finally understand the nuance of the return value -- thanks for describing that fully for me.

In theory (after #974 is implemented), the current return value from indexes_min should work for indexing, e.g.,

>>> indexes = da.indexes_min(dims='y')
>>> indexes
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) <U1 'c' 'b' 'a'
Data variables:
    y        (x) int64 0 1 0

>>> da.sel(x=indexes.x, y=indexes.y)  # or ds.sel(**indexes)
<xarray.DataArray (x: 3)>
array([ 1, 40, 5])
Coordinates:
  * x        (x) <U1 'c' 'b' 'a'

So maybe that is the right choice, though I'm not entirely certain yet.

Side note: I'm still not super happy with the names idxmin and indexes_min. They look too different for methods that are only a small variation on each other. Maybe idxmin_dataset or idxmin_dict?

fujiisoup · 2017-07-05T10:13:18Z

@shoyer
Sorry. My example was still not good (I modified the previous example slightly).

I think my current implementation does not work even for (the future version of) .sel, because the resultant xr.Dataset has the index for 'y' and labels for 'x'.

As you suggested, it looks more xarray-like if it returns coordinates for both 'x' and 'y',

<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) <U1 'c' 'b' 'a'
Data variables:
    y        (x) float64 1.4 4.3 1.4

so that it can be passed to .sel.

This API looks much more natural to me than the current one, although it has to wait for #974.
I will update this PR.

For the function name.
Based on your suggestion, how about labels_min or argmin_labels?
I think function names of idxmin and (the current) indexes_min can be too different
because idxmin should return 'index' while indexes_min returns label.

johnomotani · 2020-04-04T20:23:17Z

Any plans to finish/merge this PR? indexes_max and indexes_min would be very nice to have! (See also #3160.) Although I guess the idxmin/idxmax are superseded by #3871?

fujiisoup · 2020-04-04T23:24:20Z

Hi @johnomotani .
Probably I have no time to finish this up and this is already too old.
It would be nice if someone can update this PR.

fujiisoup added 8 commits June 19, 2017 18:53

Raise ValueError in .argmin() and .argmax() for multidimensional data.

747d188

argmin_indices impemented for Variable

10e944f

DataArray.argmin_indexes implemented.

7affe87

Make flake8 pass

eed67f2

Merge branch 'master' into argmin_indices

43afb85

Test for dataarray.argmin_indices

4f05640

Added what's new.

4a1a74d

Remove test for argmin_indexes_isel.

066a778

fujiisoup added 2 commits July 4, 2017 22:54

Renamed argmin_indexes -> indexes_min

89ca29f

Added idxmin

Added skipna option.

4507279

UPdated what's new and api.rst

81c61b7

jhamman added the topic-indexing label Jul 13, 2017

fujiisoup mentioned this pull request Oct 21, 2017

argmin / argmax behavior doesn't match documentation #1388

Closed

johnomotani mentioned this pull request Apr 5, 2020

Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods #3936

Merged

4 tasks

dcherian closed this in #3936 Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argmin indexes #1469

Argmin indexes #1469

fujiisoup commented Jul 1, 2017

shoyer commented Jul 3, 2017

fujiisoup commented Jul 4, 2017 •

edited

Loading

shoyer commented Jul 5, 2017

fujiisoup commented Jul 5, 2017

johnomotani commented Apr 4, 2020

fujiisoup commented Apr 4, 2020

Argmin indexes #1469

Argmin indexes #1469

Conversation

fujiisoup commented Jul 1, 2017

shoyer commented Jul 3, 2017

fujiisoup commented Jul 4, 2017 • edited Loading

shoyer commented Jul 5, 2017

fujiisoup commented Jul 5, 2017

johnomotani commented Apr 4, 2020

fujiisoup commented Apr 4, 2020

fujiisoup commented Jul 4, 2017 •

edited

Loading