-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626
ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626
Conversation
Now pd.core.dtypes.inference.is_list_like correctly identifies numpy-like scalars as not being iterable
Co-authored-by: keewis <[email protected]>
I'm not completely sure why, but reverting here for simplicity
Also avoid np.iterable
…ar (neither are 0-dimensional numpy arrays)
A concern on the initial PR were potential performance regressions. The micro-benchmarks for
I am not convinced if that should be a blocker to this PR though. If that is the case, I propose to optimise the current implementation using a shortcut for real python lists and potentially numpy arrays, given that those will be the common cases. |
For comparison, here is the asv output for
Obviously, sets, tuples and dicts still have the regression. One could shortcut those to, but that would simply being tricking the benchmark; we'd still be regressing the benchmark on the "other cases" not handled by the benchmark. micro-benchmarks are a dangerous thing. I'll now restart the full asv benchmarks, but I won't be able to limit system load for that long time - I still need to work in the meantime :-). |
did you push this version? (and agree it would be fine to special case list/np.ndarray) |
@jreback: Not yet; I was hoping to get a full asv benchmark out of GitHub Actions before that, I guess however that one is too time-consuming to be run on every PR commit. I'll push right now. |
yeah you an run some set of benchmarks locally to verify (sure a whole benchmark is useful too). |
for sure. that's why we want to see if a more macro benchmark exercises this in a more meaningful way and then see how that behaves with your change. |
Do we already have a suitable macro benchmark available to that end? Otherwise I'll try to run the whole asv suite overnight. I had previously tried to run just |
right i think this affects a lot of benchmarks in a small (constant way), but we may have some where its more significant. would be great to figure out which ones are sensitive (and then add a comment about those in the function for future reference). so don't really need to run all benchmarks just a subset |
but how do I identify the sensitive ones, apart from running the full benchmark suite and see where things change significantly? |
right, yeah run a smattering of benchmarks that your eye tells you might be affected :-<> (or could construct a new one that specificially hits this) |
Indexing might be a good start |
what if |
@jbrockmendel: Indeed, I would prefer if |
Here is the result of the overnight benchmark. There are quite a few performance improvements (mainly in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine to merge. pls add a whatsnew note for 1.4 other enhancement section. merge master and ping on green.
These are 100% false positives. By construction, nothing in tslibs depends on anything in _libs.lib. index_cached_properties are also likely false-positives, but would need to look to be certain. |
@jreback: Sorry for letting this slip a little. I merged with master again, and fixed the remaining changes requested during review. The functionality was not touched, only code-style ( |
Ok, I tried understanding better what is going on here, but the test-suite fails more than 50 tests locally even on current master. I was caught again by the US-isms of #44625 and #44715, and I further fail at
|
None of the CI failures here are related to this PR. You don't need to do anything at the moment.
This happens frequently in the azure pipelines and a lot of effort is going into figuring out why. It is very annoying.
If the check should be made more specific, a patch would be welcome. |
Why is it that I get this experience so often with Microsoft products? ("Something failed, but I won't tell you what - go annoy your administrator with that! You are the administrator? Not my problem.")
I'm not sure - I do not know what the expected behaviour was before 1.20 and if I see that behaviour or a third, unrelated one. |
lgtm @jbrockmendel ok here? |
no complaints here |
thanks @burnpanck very nice! |
This PR picks up the work of @znicholls started in #35127. Compared to that PR, it does not attempt to address anything except
is_list_like
:assert_almost_equal
is left as-is.is_scalar
is not touched either.Note that one of the driving use-cases for #35131 are pint Quantities within pint-pandas, which may either wrap a scalar or an array. In that case, one may want to identify Quantities wrapping a scalar as a scalar. However, the current definition of
is_scalar
is very strict, in that it does not accept zero-dimensional numpy arrays as scalars, so we'd first have to come up with a clear definition what exactly makes a "scalar". I would therefore consider this a separate issue. Either way, havingis_list_like
correctly treat Quantites will help pint-pandas already a lot.