-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement idxmax and idxmin functions #3871
Conversation
This looks extremely good and thorough! Thanks @toddrjen ! I'll have a proper look through later. I see a couple of minor questions I'll add in too. Others feel free to get ahead of me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @max-sixty: this looks really good.
I will also have a proper look later, but for now I do think you don't have to change map
to accept method names: we can pass callables that do the lookup for us or use unbound methods (something like cls.method
instead of obj.method
). I'd prefer unbound methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of nits. Overall this does look very good, and thanks for adding lots of tests including those outside these functions.
To what extent should this support non-index coordinates?
Any other thoughts?
@keewis @max-sixty The map thing is purely a convenience function. I know there are other ways to do it, but since I thought this would be a useful feature for users in its own right, I did it that way. But of course I can do it another way if you disagree. The one complication is that using I will address the other comments inline. |
@keewis @max-sixty The new commit with the requested changes has been pushed to this branch (except for the |
I am not familiar with non-index coordinates, what are those? Do you mean non-dimension coordinates? Does that even make sense in a general way? If they are 1D and tied to just one dimension coordinate that could be done, but if they are not tied to any dimension or tied to multiple dimensions or otherwise not 1D I am not sure what it would mean to take the idxmin/idxmax of them. |
I hear you and share the impulse that baking this in seems not ideal. Though I think it's a reasonable compromise to make, and there are no plans to deviate from it. (Ideally maybe we have a I think having a lambda is fine too. |
Yes, thanks for clarifying
Yes, it wouldn't work in all cases, fair point. There are some cases in which it would work though, I'm unsure if it would be too complicated an interface to return them depending on whether it would work. (and completely fine to contemplate these later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
I fixed the extra space in the docstring and moved the business logic to |
LGTM! Any other thoughts before we merge? |
In general this is really nicely put together. My main asks:
For details see the comments above. If we want to support (2), then it might make sense to use a string for selecting values of |
That could work.
The corner case we would need to decide on is again promotion.
What happens if the fill value is a "higher" type in the numeric tower than
the original type? What if it is lower?
1. We could try to always convert to the fill dtype (or more often the
dtype equivalent to the Python native type), and raise and exception of it
doesn't work.
2. We could promote the fill value or original data, whichever is
"lower".
What if someone tries to use a string type for numeric data or vice
versus? If we do option 1 that is easy. Otherwise we probably need to use
numpy casting rules?
What about an object dtype fill value?
What about a date/time regard dtype?
…On Mon, Mar 23, 2020, 23:49 Stephan Hoyer ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xarray/core/dataset.py
<#3871 (comment)>:
> @@ -5914,5 +5921,169 @@ def pad(
return self._replace_vars_and_dims(variables)
+ def idxmin(
+ self,
+ dim: Hashable = None,
+ axis: int = None,
+ skipna: bool = None,
+ promote: bool = None,
Just to throw out another API option: what about having a fill_value
argument instead of promote? The default (fill_value=dtypes.NA) would do
type promotion for integer dtypes and always fill with NA. Other values
(e.g., fill_value=0) could be used to avoid type promotion with an
integer coordinate.
Advantages:
- No special cases to keep track of.
- Consistent with other xarray methods that take a fill_value argument.
Disadvantages:
- No built-in way to raise an error instead of promotion (but users
could do this themselves pretty easily)
- No built-in way to "only promote if necessary" (but this is a weird
non-type stable API that doesn't work great with Dask, anyways)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3871 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARK43Q3GISCTFURNDUDQNDRJAUTJANCNFSM4LP7WEMA>
.
|
37173ec
to
272a690
Compare
Please see the newest version with the |
I am not sure why the tests are suddenly failing. The tests were all working, then I rebased on the latest master and they are failing and I can't figure out why. |
I figured out what is going wrong. I will make a commit with a fix and include it in this pull request later today. |
51e4926
to
b7b8f6b
Compare
I think I have implemented all the requested changes and all tests are passing. Please take a look. |
Here is a new commit with the discussed changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor doc suggestions.
Amazing — let's merge on green! |
I have gone over it one more time and made a few documentation fixes. Please take one more look before merging. |
Thanks @toddrjen that's a great contribution! |
@toddrjen thank you again! This ended up being quite an adventure, really appreciate you pushing all the way. |
* upstream/master: (75 commits) Implement idxmax and idxmin functions (pydata#3871) Update pre-commit-config.yaml (pydata#3911) Revert "Use `fixes` in PR template (pydata#3886)" (pydata#3912) update the docstring of diff (pydata#3909) Un-xfail test_dayofyear_after_cftime_range (pydata#3907) Limit repr of arrays containing long strings (pydata#3900) expose a few zarr backend functions as semi-public api (pydata#3897) Use drawstyle instead of linestyle in plot.step. (pydata#3274) Implementation of polyfit and polyval (pydata#3733) misplaced quote in whatsnew (pydata#3889) Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887) Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877) xfail test_uamiv_format_write (pydata#3885) Use `fixes` in PR template (pydata#3886) Tweaks to "how_to_release" (pydata#3882) whatsnew section for 0.16.0 Release v0.15.1 whatsnew for 0.15.1 (pydata#3879) update panel documentation (pydata#3880) reword the whats-new entry for unit support (pydata#3878) ...
|
||
# Handle dask arrays. | ||
if isinstance(array, dask_array_type): | ||
res = dask_array.map_blocks(coordarray, indx, dtype=indx.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more things:
isinstance
-check need to usearray.data
.res
need to be computed, otherwise subsequent actions withres
will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right, this is definitely broken. Anyone up for putting together a fix in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements
idxmax
andidxmin
functions similar to thier pandas equivalents.This is my first time contributing to the project so I am not certain the structure or approach is the best. Please let me know if there is a better way to implement this.
This also includes two other changes.
First, it drops some code for backwards-compatibility with numpy 1.12, which isn't supported. This code was hiding an error I needed to have access to in order to get the function working.
Second, it adds an option to
Dataset.map
to let you mapDataArray
methods by name. I used this to implement theDataset
versions ofidxmax
andidxmin
.isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API