-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131
Comments
@znicholls Thanks for the clearly described issue! We are having a similar problem in GeoPandas where the scalars (eg MultiPolygon) can be potentially iterable as well / can have a length (although here they are actually iterable, so not raising an error, making it even harder to detect this). xref #26333, #27911 So in those referenced issues, it is the length-check that is problematic. However in the case of |
Correct. To make this work, I think we'd have to add an |
Note: I wasn't sure whether to label this as an enhancement or bug. I've put enhancement as I don't think the original implementation was actually incorrect, rather just limited in scope. I'm happy to change to bug if that's more appropriate.
Is your feature request related to a problem?
I'm one of the developers of Pint-Pandas, which provides an extension type for storing numeric data with units (and, once mature, may solve #10349). The problem we're having is that we can't pass all the extension type tests. In particular, we fail many tests due to Panda's current implementation of
is_list_like
.As @hgrecco shows here, pandas determines whether something
is_list_like
here (code copied below).At the moment, all Pint
Quantity
instances have an__iter__
method. It is designed to raise aTypeError
if the quantity isn't actually iterable (code here).The problem is that the
__iter__
method is defined, regardless of whether theQuantity
instance is actually iterable or not. This is a problem because it means that, in all cases,isinstance(quantity_instance, abc.Iterable)
returnsTrue
(even ifquantity_instance
isn't actually iterable and would return aTypeError
as soon as one tried to iterate with it).A numpy array suffers from the same issue which is where the
not (util.is_array(obj) and obj.ndim == 0)
in Panda'sis_list_like
is important, because it excludes scalars. The problem is thatutil.is_array(obj)
returnsFalse
ifobj
is anything other thannp.ndarray
. As a result, we can't just add anndim
attribute toQuantity
because theobj.ndim == 0
check will never be evaluated anyway.Describe the solution you'd like
Update
is_list_like
so that it correctly identifies implementations of scalars in numpy-like arrays (I think these are called 'duck numpy arrays'?). I've put a draft in #35127 and would be grateful for feedback on the proposed solution there. If there's a better solution, I'm also happy to implement that instead.API breaking implications
is_list_like
would now returnFalse
for scalars of numpy-like array implementations. If we used the implementation in #35127 then any object that has anndim
attribute and hasobj.ndim == 0
would no longer be identified as list-like.Describe alternatives you've considered
We've discussed how we could get the behaviour we want by changing Pint, rather than changing Pandas, but it seems to be impossible if we want to maintain the numpy-like implementation without separate scalar and array classes (see discussion here).
Additional context
The text was updated successfully, but these errors were encountered: