ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

burnpanck · 2021-11-26T11:48:42Z

closes ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

This PR picks up the work of @znicholls started in #35127. Compared to that PR, it does not attempt to address anything except is_list_like:

assert_almost_equal is left as-is.
is_scalar is not touched either.

Note that one of the driving use-cases for #35131 are pint Quantities within pint-pandas, which may either wrap a scalar or an array. In that case, one may want to identify Quantities wrapping a scalar as a scalar. However, the current definition of is_scalar is very strict, in that it does not accept zero-dimensional numpy arrays as scalars, so we'd first have to come up with a clear definition what exactly makes a "scalar". I would therefore consider this a separate issue. Either way, having is_list_like correctly treat Quantites will help pint-pandas already a lot.

Now pd.core.dtypes.inference.is_list_like correctly identifies numpy-like scalars as not being iterable

Co-authored-by: keewis <[email protected]>

I'm not completely sure why, but reverting here for simplicity

Also avoid np.iterable

…ar (neither are 0-dimensional numpy arrays)

burnpanck · 2021-11-26T11:54:59Z

A concern on the initial PR were potential performance regressions. The micro-benchmarks for is_list_like indeed currently do regress:

       before           after         ratio
     [aad39a86]       [58e72f96]
     <master>         <bugfix/fix-35131-zero-dim-duck-arrays>
+        341±10ns         577±20ns     1.69  libs.ScalarListLike.time_is_list_like({0: 1})
+         352±8ns         585±20ns     1.67  libs.ScalarListLike.time_is_list_like([1, 2, 3])
+         348±5ns         576±20ns     1.65  libs.ScalarListLike.time_is_list_like({1, 2, 3})
+         345±9ns         571±20ns     1.65  libs.ScalarListLike.time_is_list_like((1, 2, 3))
+        334±10ns         371±10ns     1.11  libs.ScalarListLike.time_is_list_like(array([1, 2, 3]))
+        340±10ns          374±4ns     1.10  libs.ScalarListLike.time_is_list_like(array('123', dtype='<U3'))

I am not convinced if that should be a blocker to this PR though. is_list_like is a sub-microsecond function that is usually not being used in inner loops, rather just once per input argument of higher-level functions. Thus, the performance impact should really be compared at more real use-case interfaces. I'm currently trying to run the full benchmark suite, to see if any higher-level benchmark sees a significant regression.

If that is the case, I propose to optimise the current implementation using a shortcut for real python lists and potentially numpy arrays, given that those will be the common cases.

burnpanck · 2021-11-26T12:15:32Z

For comparison, here is the asv output for is_list_like with the proposed shortcuts:

       before           after         ratio
     [5ee8cf1d]       [c4c348b4]
     <master>         <bugfix/fix-35131-zero-dim-duck-arrays>
+        385±30ns          616±2ns     1.60  libs.ScalarListLike.time_is_list_like({1, 2, 3})
+        388±20ns          617±1ns     1.59  libs.ScalarListLike.time_is_list_like({0: 1})
+        397±30ns          611±3ns     1.54  libs.ScalarListLike.time_is_list_like((1, 2, 3))
-        381±90ns          269±3ns     0.71  libs.ScalarListLike.time_is_list_like(array([1, 2, 3]))
-        393±30ns          273±3ns     0.69  libs.ScalarListLike.time_is_list_like([1, 2, 3])
-        424±80ns          266±1ns     0.63  libs.ScalarListLike.time_is_list_like(array('123', dtype='<U3'))

Obviously, sets, tuples and dicts still have the regression. One could shortcut those to, but that would simply being tricking the benchmark; we'd still be regressing the benchmark on the "other cases" not handled by the benchmark. micro-benchmarks are a dangerous thing.

I'll now restart the full asv benchmarks, but I won't be able to limit system load for that long time - I still need to work in the meantime :-).

jreback · 2021-11-26T15:10:47Z

For comparison, here is the asv output for is_list_like with the proposed shortcuts:

did you push this version? (and agree it would be fine to special case list/np.ndarray)

burnpanck · 2021-11-26T15:31:40Z

@jreback: Not yet; I was hoping to get a full asv benchmark out of GitHub Actions before that, I guess however that one is too time-consuming to be run on every PR commit. I'll push right now.

jreback · 2021-11-26T15:34:01Z

@jreback: Not yet; I was hoping to get a full asv benchmark out of GitHub Actions before that, I guess however that one is too time-consuming to be run on every PR commit. I'll push right now.

yeah you an run some set of benchmarks locally to verify (sure a whole benchmark is useful too).

jreback · 2021-11-26T15:35:21Z

Obviously, sets, tuples and dicts still have the regression. One could shortcut those to, but that would simply being tricking the benchmark; we'd still be regressing the benchmark on the "other cases" not handled by the benchmark. micro-benchmarks are a dangerous thing.

for sure. that's why we want to see if a more macro benchmark exercises this in a more meaningful way and then see how that behaves with your change.

burnpanck · 2021-11-26T15:39:52Z

Do we already have a suitable macro benchmark available to that end? Otherwise I'll try to run the whole asv suite overnight. I had previously tried to run just benchmarks/series_methods.py without actually knowing if that exercises is_list_like at all. However, that already took too long that I was loading the system in the meantime. In the end, the only reported changes looked random and unrelated, so I concluded I cannot run meaningful benchmarks while using the computer at the same time.

jreback · 2021-11-26T16:20:11Z

Do we already have a suitable macro benchmark available to that end? Otherwise I'll try to run the whole asv suite overnight. I had previously tried to run just benchmarks/series_methods.py without actually knowing if that exercises is_list_like at all. However, that already took too long that I was loading the system in the meantime. In the end, the only reported changes looked random and unrelated, so I concluded I cannot run meaningful benchmarks while using the computer at the same time.

right i think this affects a lot of benchmarks in a small (constant way), but we may have some where its more significant. would be great to figure out which ones are sensitive (and then add a comment about those in the function for future reference).

so don't really need to run all benchmarks just a subset

burnpanck · 2021-11-26T16:33:07Z

but how do I identify the sensitive ones, apart from running the full benchmark suite and see where things change significantly?

jreback · 2021-11-26T16:37:47Z

but how do I identify the sensitive ones, apart from running the full benchmark suite and see where things change significantly?

right, yeah run a smattering of benchmarks that your eye tells you might be affected :-<> (or could construct a new one that specificially hits this)

phofl · 2021-11-26T17:03:39Z

Indexing might be a good start

pandas/_libs/lib.pyx

pandas/tests/dtypes/test_inference.py

jbrockmendel · 2021-11-26T18:49:54Z

Either way, having is_list_like correctly treat Quantites will help pint-pandas already a lot.

what if Quantity.__iter__ either raised or returned None when ndim == 0?

burnpanck · 2021-11-26T19:40:49Z

@jbrockmendel: Quantity.__iter__ already does raise when it doesn't wrap an iterable. The current is_list_like only checks if __iter__ itself exists and is not None, but does not call __iter__. Because that doesn't work, the workaround to add an ndim attribute that is 0 is currently being considered - however, that still requires this change here.

Indeed, I would prefer if is_list_like would attempt to call __iter__ and consider any exception here as non-iterable, as that would match the python definition of what is actually iterable (see the abc.Iterable doc). However, the performance impact would likely be much bigger. I believe that was the reason why @znicholls went for the ndim option.

pandas/_libs/lib.pyx

burnpanck · 2021-11-28T09:09:47Z

Here is the result of the overnight benchmark. There are quite a few performance improvements (mainly in tslibs), but also a few regressions, notably strings.Methods.time_[r]partition. That one seems to be real, I re-run it and got similar results. That one tests for Series.str.[r]partition accessors. Not familiar with that code base, I couldn't even see where it calls is_list_like at all, but the implementation is complicated enough with quite a few object arrays and NA handling so that I can believe it is calling is_list_like. I quickly tried adding a shortcut for literals strings, but that didn't seem to make a difference.

       before           after         ratio
     [5ee8cf1d]       [b2852935]
     <master>         <bugfix/fix-35131-zero-dim-duck-arrays>
+         254±1ns          450±1ns     1.77  libs.ScalarListLike.time_is_list_like({0: 1})
+         255±2ns          450±4ns     1.76  libs.ScalarListLike.time_is_list_like({1, 2, 3})
+       254±0.5ns          445±2ns     1.75  libs.ScalarListLike.time_is_list_like((1, 2, 3))
+            106M             156M     1.47  rolling.PeakMemFixedWindowMinMax.peakmem_fixed('min')
+      48.8±0.6ms       68.6±0.5ms     1.41  strings.Methods.time_rpartition('string[python]')
+        49.1±1ms       68.9±0.7ms     1.40  strings.Methods.time_partition('string[python]')
+     1.23±0.02ms      1.72±0.04ms     1.40  algos.isin.IsinWithArangeSorted.time_isin(<class 'numpy.int64'>, 100000)
+     2.25±0.05ms       3.14±0.5ms     1.40  index_cached_properties.IndexCache.time_is_unique('MultiIndex')
+        60.3±1ms       81.0±0.7ms     1.34  strings.Methods.time_rpartition('string[pyarrow]')
+      68.4±0.4ms         89.3±1ms     1.30  strings.Methods.time_partition('str')
+      62.6±0.6ms       81.7±0.2ms     1.30  strings.Methods.time_partition('string[pyarrow]')
+      68.6±0.4ms       89.0±0.2ms     1.30  strings.Methods.time_rpartition('str')
+     1.59±0.05μs       1.90±0.2μs     1.20  index_cached_properties.IndexCache.time_shape('PeriodIndex')
+     3.48±0.05ms       3.91±0.1ms     1.12  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
+         802±7μs         894±20μs     1.11  strings.Methods.time_len('string[pyarrow]')
+     40.0±0.08ms       44.3±0.8ms     1.11  strings.Methods.time_join('string[python]')
-      4.34±0.2μs      3.95±0.03μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 2000, datetime.timezone.utc)
-     2.16±0.04μs      1.96±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4000, None)
-        260±20ms          236±2ms     0.91  multiindex_object.Values.time_datetime_level_values_copy
-        29.5±3ms      26.8±0.05ms     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 10000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      10.4±0.7μs       9.42±0.2μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 8000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        515±40ns        468±0.9ns     0.91  tslibs.timestamp.TimestampProperties.time_dayofweek(None, 'B')
-        288±10μs        262±0.5μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4006, datetime.timezone.utc)
-        28.4±2ms       25.8±0.1ms     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 8000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        410±20ms        372±0.9ms     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 5000, tzlocal())
-         271±9μs          246±2μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2000, datetime.timezone(datetime.timedelta(seconds=3600)))
-            158M             143M     0.91  io.json.ToJSON.peakmem_to_json('index', 'df_int_floats')
-        29.2±2ms       26.5±0.1ms     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4006, None)
-     4.97±0.07μs      4.51±0.03μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 8000, None)
-      10.4±0.5μs      9.39±0.02μs     0.91  tslibs.offsets.OffestDatetimeArithmetic.time_add(<QuarterEnd: startingMonth=3>)
-      2.47±0.3ms      2.24±0.04ms     0.91  rolling.VariableWindowMethods.time_method('DataFrame', '50s', 'float', 'std')
-     1.13±0.06μs      1.02±0.01μs     0.91  tslibs.resolution.TimeResolution.time_get_resolution('D', 0, datetime.timezone.utc)
-      1.68±0.1ms         1.52±0ms     0.91  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('date', 10000, None)
-      8.59±0.5μs      7.79±0.06μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 4000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      3.10±0.4ms      2.81±0.01ms     0.91  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      9.01±0.8μs      8.16±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 3000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      2.10±0.1μs      1.91±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, tzlocal())
-         288±5μs        261±0.8μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 11000, datetime.timezone(datetime.timedelta(seconds=3600)))
-     1.11±0.06μs      1.00±0.02μs     0.91  tslibs.timestamp.TimestampProperties.time_freqstr(None, None)
-      10.1±0.6μs      9.16±0.08μs     0.90  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessDay>)
-      4.44±0.4μs      4.02±0.02μs     0.90  tslibs.fields.TimeGetStartEndField.time_get_start_end_field(100, 'start', 'quarter', 'B', 5)
-     4.75±0.08μs      4.30±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 8000, datetime.timezone.utc)
-      6.05±0.4μs      5.47±0.07μs     0.90  tslibs.timestamp.TimestampProperties.time_is_quarter_start(datetime.timezone.utc, 'B')
-      5.66±0.3μs      5.12±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2011, datetime.timezone(datetime.timedelta(seconds=3600)))
-        30.0±3ms      27.1±0.03ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      3.50±0.2μs      3.16±0.03μs     0.90  tslibs.resolution.TimeResolution.time_get_resolution('us', 100, datetime.timezone.utc)
-     2.10±0.09μs      1.90±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, tzlocal())
-        43.8±4ms       39.5±0.2ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2011, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      3.28±0.3μs      2.96±0.02μs     0.90  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, datetime.timezone.utc)
-      1.98±0.1μs      1.79±0.04μs     0.90  tslibs.timestamp.TimestampConstruction.time_parse_iso8601_no_tz
-      4.28±0.2ms      3.87±0.05ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 7000, tzlocal())
-        248±20ns        224±0.4ns     0.90  tslibs.timestamp.TimestampProperties.time_is_year_end(tzfile('/usr/share/zoneinfo/Asia/Tokyo'), None)
-      5.54±0.5μs      5.00±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      3.24±0.2ms      2.93±0.02ms     0.90  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 10000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        29.3±4ms      26.4±0.06ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4006, datetime.timezone.utc)
-        663±50ns         598±10ns     0.90  tslibs.timestamp.TimestampOps.time_to_pydatetime(datetime.timezone(datetime.timedelta(seconds=3600)))
-         242±3μs        218±0.2μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 3000, None)
-      5.87±0.5μs      5.29±0.04μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 8000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        309±20ns        279±0.9ns     0.90  tslibs.timestamp.TimestampProperties.time_dayofyear(None, None)
-      5.62±0.3μs      5.06±0.04μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 6000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.47±0.09μs      1.32±0.01μs     0.90  tslibs.timestamp.TimestampConstruction.time_from_npdatetime64
-        285±10μs          256±1μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, None)
-        304±20ns        274±0.9ns     0.90  tslibs.timestamp.TimestampProperties.time_dayofyear(tzfile('/usr/share/zoneinfo/Asia/Tokyo'), None)
-     1.13±0.07μs      1.02±0.01μs     0.90  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, datetime.timezone.utc)
-         341±6μs          307±3μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 11000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-       242±0.4μs          218±1μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 3000, datetime.timezone.utc)
-      4.23±0.3ms      3.80±0.03ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 10000, tzlocal())
-      4.90±0.4μs      4.40±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, datetime.timezone.utc)
-      5.17±0.2ms       4.65±0.1ms     0.90  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('timestamp', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      4.24±0.3ms      3.81±0.02ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 4006, tzlocal())
-        36.1±4ms      32.4±0.07ms     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 11000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        319±30ns          287±1ns     0.90  tslibs.timestamp.TimestampProperties.time_week(None, 'B')
-         584±2μs          525±1μs     0.90  reshape.Explode.time_explode(1000, 5)
-        299±10μs        269±0.7μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      8.67±0.3μs       7.78±0.1μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      5.73±0.3μs      5.14±0.08μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 8000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        458±40ns          411±2ns     0.90  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(100, datetime.timezone.utc)
-         286±6μs          257±2μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, datetime.timezone.utc)
-        40.6±3μs       36.4±0.6μs     0.90  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 100, None)
-      10.7±0.5ms      9.61±0.05ms     0.90  timeseries.Iteration.time_iter_preexit(<function date_range at 0x121aeeca0>)
-      2.09±0.1μs      1.87±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, None)
-         317±3μs          284±2μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4006, datetime.timezone(datetime.timedelta(seconds=3600)))
-      1.98±0.2μs         1.77±0μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, datetime.timezone.utc)
-        518±40ns          464±2ns     0.90  tslibs.timestamp.TimestampProperties.time_dayofweek(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>, 'B')
-     1.25±0.09μs      1.12±0.01μs     0.90  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, None)
-      2.13±0.1μs      1.91±0.03μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 12000, tzlocal())
-      2.11±0.1μs      1.88±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 6000, None)
-         282±4μs          252±2μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 11000, None)
-        37.5±3μs       33.5±0.1μs     0.89  tslibs.timestamp.TimestampOps.time_tz_convert(tzlocal())
-        37.2±4ms      33.3±0.04ms     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      5.56±0.4μs      4.97±0.04μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1011, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        388±40μs          347±1μs     0.89  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-         732±7μs          654±6μs     0.89  reshape.Explode.time_explode(1000, 10)
-     2.11±0.04μs      1.89±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 7000, tzlocal())
-         296±4μs          264±2μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4006, None)
-      5.15±0.3μs      4.60±0.03μs     0.89  tslibs.resolution.TimeResolution.time_get_resolution('s', 0, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      5.94±0.3μs      5.30±0.08μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 11000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        493±20μs          440±3μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 11000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        221±20μs          197±1μs     0.89  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        273±20ns          244±3ns     0.89  tslibs.timestamp.TimestampOps.time_tz_localize(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        428±30μs          382±1μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        312±20ns          278±4ns     0.89  tslibs.timestamp.TimestampProperties.time_dayofyear(tzlocal(), None)
-      4.35±0.4ms      3.88±0.03ms     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, tzlocal())
-     6.85±0.05ms      6.11±0.06ms     0.89  reshape.Explode.time_explode(10000, 10)
-      2.13±0.2μs      1.90±0.02μs     0.89  tslibs.timestamp.TimestampConstruction.time_fromtimestamp
-      4.24±0.3ms      3.78±0.05ms     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 11000, tzlocal())
-     6.64±0.09μs      5.91±0.03μs     0.89  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('timestamp', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.95±0.09μs      1.74±0.03μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 3000, datetime.timezone.utc)
-      2.17±0.2μs      1.93±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 7000, tzlocal())
-        34.2±5ms      30.4±0.06ms     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      2.52±0.2μs      2.24±0.02μs     0.89  tslibs.timestamp.TimestampOps.time_replace_None(None)
-      5.60±0.1μs      4.98±0.09μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 3000, datetime.timezone(datetime.timedelta(seconds=3600)))
-         428±5ms          381±3ms     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, tzlocal())
-     1.80±0.05μs      1.60±0.07μs     0.89  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(100, datetime.timezone.utc)
-      4.98±0.4μs      4.42±0.05μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, None)
-      9.40±0.3μs      8.36±0.07μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 7000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      10.3±0.7μs       9.11±0.2μs     0.89  tslibs.offsets.OffestDatetimeArithmetic.time_add(<YearBegin: month=1>)
-      8.60±0.5μs      7.64±0.04μs     0.89  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      10.7±0.5μs      9.51±0.05μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 4000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      4.81±0.4μs      4.27±0.01μs     0.89  tslibs.resolution.TimeResolution.time_get_resolution('h', 0, datetime.timezone(datetime.timedelta(seconds=3600)))
-      2.17±0.1μs      1.92±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, tzlocal())
-        33.3±2μs       29.5±0.4μs     0.89  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 100, datetime.timezone.utc)
-         357±9μs          317±1μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      5.67±0.4μs       5.03±0.2μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        256±20μs          227±1μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 1000, None)
-      6.05±0.3ms      5.36±0.06ms     0.89  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('timestamp', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        435±40ms          385±2ms     0.89  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 10000, tzlocal())
-         278±2μs          246±2μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 3000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      10.7±0.8μs       9.49±0.1μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 7000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        229±20ns        203±0.8ns     0.88  tslibs.timestamp.TimestampProperties.time_microsecond(datetime.timezone.utc, 'B')
-     2.98±0.02ms      2.63±0.01ms     0.88  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'float32')
-      5.11±0.2μs      4.51±0.02μs     0.88  tslibs.resolution.TimeResolution.time_get_resolution('h', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        28.2±2ms      24.9±0.04ms     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 8000, datetime.timezone.utc)
-     1.13±0.05μs         994±30ns     0.88  tslibs.timestamp.TimestampProperties.time_freqstr(tzlocal(), None)
-      2.21±0.3ms         1.95±0ms     0.88  groupby.CountMultiInt.time_multi_int_count
-      1.99±0.2μs      1.76±0.02μs     0.88  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 3000)
-      5.65±0.5μs      4.97±0.01μs     0.88  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 11000)
-         504±8μs          443±1μs     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      5.81±0.5μs      5.11±0.05μs     0.88  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 7000)
-      7.76±0.8μs      6.83±0.02μs     0.88  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        53.5±4ms      47.0±0.05ms     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4006, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        59.7±6ms       52.5±0.2ms     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 5000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      5.69±0.3μs       5.00±0.1μs     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, datetime.timezone(datetime.timedelta(seconds=3600)))
-      5.52±0.5μs      4.84±0.03μs     0.88  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 0, datetime.timezone(datetime.timedelta(seconds=3600)))
-      1.90±0.1μs      1.67±0.05μs     0.88  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 5000)
-        72.9±9μs         63.8±1μs     0.88  tslibs.timestamp.TimestampOps.time_floor(datetime.timezone.utc)
-      6.24±0.4μs      5.46±0.01μs     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        429±10ms          376±5ms     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 10000, tzlocal())
-        20.2±2μs      17.6±0.09μs     0.88  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(0, datetime.timezone(datetime.timedelta(seconds=3600)))
-      7.48±0.7μs       6.55±0.1μs     0.88  tslibs.timestamp.TimestampOps.time_replace_None(datetime.timezone(datetime.timedelta(seconds=3600)))
-        286±10ns          250±8ns     0.88  tslibs.timestamp.TimestampConstruction.time_from_pd_timestamp
-      5.56±0.5μs      4.86±0.02μs     0.87  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 0, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-         260±4μs        227±0.6μs     0.87  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, None)
-       260±0.9μs        227±0.7μs     0.87  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, datetime.timezone.utc)
-        264±20ns          231±4ns     0.87  tslibs.timestamp.TimestampProperties.time_quarter(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>, 'B')
-        351±20μs          307±1μs     0.87  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 8000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      25.7±0.9ms      22.4±0.03ms     0.87  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 1000, None)
-     1.29±0.07μs      1.13±0.01μs     0.87  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, None)
-     5.31±0.09ms      4.63±0.01ms     0.87  reshape.Explode.time_explode(10000, 5)
-      288±0.07μs        251±0.6μs     0.87  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, datetime.timezone(datetime.timedelta(seconds=3600)))
-        81.9±9ms       71.3±0.5ms     0.87  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(10000, tzlocal())
-      1.24±0.1μs      1.08±0.03μs     0.87  index_cached_properties.IndexCache.time_values('PeriodIndex')
-     1.26±0.05μs      1.09±0.01μs     0.87  tslibs.resolution.TimeResolution.time_get_resolution('h', 0, None)
-      4.79±0.3μs      4.17±0.04μs     0.87  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      7.31±0.6μs       6.35±0.2μs     0.87  tslibs.timestamp.TimestampConstruction.time_parse_iso8601_tz
-        29.9±3μs       26.0±0.1μs     0.87  tslibs.timestamp.TimestampOps.time_normalize(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      6.50±0.5μs       5.64±0.1μs     0.87  tslibs.timestamp.TimestampProperties.time_is_month_end(tzlocal(), 'B')
-      1.39±0.1ms      1.21±0.01ms     0.87  algorithms.Hashing.time_series_int
-      4.73±0.5μs      4.10±0.03μs     0.87  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      6.30±0.3μs      5.45±0.03μs     0.87  tslibs.timestamp.TimestampProperties.time_is_quarter_end(datetime.timezone.utc, 'B')
-        198±20μs        171±0.6μs     0.87  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-        263±20ns          228±5ns     0.86  tslibs.timestamp.TimestampProperties.time_is_leap_year(tzfile('/usr/share/zoneinfo/Asia/Tokyo'), 'B')
-      6.58±0.4μs      5.69±0.02μs     0.86  tslibs.timestamp.TimestampProperties.time_is_quarter_end(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>, 'B')
-        530±40ns          458±2ns     0.86  tslibs.timestamp.TimestampProperties.time_dayofweek(datetime.timezone.utc, None)
-      6.47±0.5μs      5.58±0.01μs     0.86  tslibs.timestamp.TimestampProperties.time_is_year_end(tzlocal(), 'B')
-      6.60±0.5μs      5.69±0.04μs     0.86  tslibs.timestamp.TimestampProperties.time_is_month_end(datetime.timezone(datetime.timedelta(seconds=3600)), 'B')
-      8.78±0.4μs      7.57±0.01μs     0.86  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 8000, datetime.timezone(datetime.timedelta(seconds=3600)))
-        64.8±6μs       55.9±0.2μs     0.86  tslibs.timestamp.TimestampOps.time_floor(None)
-      8.22±0.9μs      7.08±0.02μs     0.86  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 11000)
-        13.8±1ms      11.9±0.02ms     0.86  groupby.AggFunctions.time_different_python_functions_multicol
-        47.7±2μs      40.8±0.05μs     0.85  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      25.8±0.9ms       22.0±0.3ms     0.85  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2000, None)
-        501±10μs          427±2μs     0.85  reshape.Explode.time_explode(1000, 3)
-        282±30ns          240±2ns     0.85  tslibs.timestamp.TimestampOps.time_tz_localize(datetime.timezone(datetime.timedelta(seconds=3600)))
-      96.9±0.5μs       82.5±0.3μs     0.85  series_methods.NanOps.time_func('std', 1000, 'float64')
-         452±7μs          384±1μs     0.85  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        541±40ns          459±7ns     0.85  tslibs.timestamp.TimestampProperties.time_dayofweek(datetime.timezone.utc, 'B')
-        47.7±3ms      40.4±0.05ms     0.85  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 5000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      2.08±0.3μs      1.77±0.01μs     0.85  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 4000)
-      6.53±0.8μs       5.52±0.2μs     0.84  tslibs.timestamp.TimestampProperties.time_is_year_start(tzfile('/usr/share/zoneinfo/Asia/Tokyo'), 'B')
-     1.93±0.09μs         1.63±0μs     0.84  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 0, tzlocal())
-      5.01±0.3μs      4.22±0.04μs     0.84  tslibs.resolution.TimeResolution.time_get_resolution('s', 0, datetime.timezone(datetime.timedelta(seconds=3600)))
-        29.2±1ms      24.6±0.07ms     0.84  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 6000, datetime.timezone.utc)
-        29.9±2ms      25.0±0.04ms     0.84  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 6000, datetime.timezone(datetime.timedelta(seconds=3600)))
-      1.94±0.2μs      1.62±0.01μs     0.84  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('datetime', 0, None)
-      2.00±0.3μs      1.66±0.01μs     0.83  tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('time', 0, None)
-      6.70±0.2μs      5.56±0.03μs     0.83  tslibs.timestamp.TimestampProperties.time_is_month_end(datetime.timezone.utc, 'B')
-     3.57±0.02ms      2.96±0.08ms     0.83  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'float64')
-         435±5μs          360±2μs     0.83  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-        37.1±4ms       30.7±0.2ms     0.83  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 1011, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-      15.1±0.3ms       12.5±0.1ms     0.83  indexing.InsertColumns.time_assign_list_like_with_setitem
-     4.46±0.03ms      3.68±0.03ms     0.83  reshape.Explode.time_explode(10000, 3)
-      15.2±0.3ms       12.5±0.2ms     0.82  indexing.InsertColumns.time_assign_with_setitem
-        34.0±4ms       27.8±0.2ms     0.82  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4006, datetime.timezone(datetime.timedelta(seconds=3600)))
-        54.2±3ms       44.4±0.3ms     0.82  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 6000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      6.00±0.8μs      4.88±0.09μs     0.81  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 8000)
-      1.70±0.3μs      1.35±0.07μs     0.79  index_cached_properties.IndexCache.time_shape('Float64Index')
-        18.5±2ms      14.3±0.02ms     0.78  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-         255±4ns          196±1ns     0.77  libs.ScalarListLike.time_is_list_like([1, 2, 3])
-         251±4ns          192±1ns     0.77  libs.ScalarListLike.time_is_list_like(array([1, 2, 3]))
-         254±1ns        195±0.9ns     0.77  libs.ScalarListLike.time_is_list_like(array('123', dtype='<U3'))
-      1.93±0.5ms      1.33±0.03ms     0.69  index_cached_properties.IndexCache.time_is_unique('DatetimeIndex')
-      2.56±0.6ms      1.46±0.07ms     0.57  index_cached_properties.IndexCache.time_is_unique('TimedeltaIndex')

jreback

looks fine to merge. pls add a whatsnew note for 1.4 other enhancement section. merge master and ping on green.

pandas/_libs/lib.pyx

jbrockmendel · 2021-11-28T21:35:50Z

There are quite a few performance improvements (mainly in tslibs)

These are 100% false positives. By construction, nothing in tslibs depends on anything in _libs.lib.

index_cached_properties are also likely false-positives, but would need to look to be certain.

…zero-dim-duck-arrays

burnpanck · 2021-12-18T13:22:50Z

@jreback: Sorry for letting this slip a little. I merged with master again, and fixed the remaining changes requested during review. The functionality was not touched, only code-style ( elif instead of if after return, more explicit comments and whatsnew entry). Somehow the azure pipelines do time-out, but I don't understand why, nor what could be causing this. Do they run fine in other PRs?

burnpanck · 2021-12-18T15:07:31Z

Ok, I tried understanding better what is going on here, but the test-suite fails more than 50 tests locally even on current master. I was caught again by the US-isms of #44625 and #44715, and I further fail at

pandas/pandas/tests/extension/base/casting.py

Line 38 in 091c9a0

comp = result.dtypes == df.dtypes

under numpy 1.20.3 - that line is only executed on 1.20+, maybe a slight version skew? Anyway, at this point I spent many hours chasing bugs that have nothing to do with my PR, so I'd appreciate if someone with a better understanding of the full test suite could tell me what is actually failing here if any...

jbrockmendel · 2021-12-18T16:38:31Z

could tell me what is actually failing here if any

None of the CI failures here are related to this PR. You don't need to do anything at the moment.

Somehow the azure pipelines do time-out, but I don't understand why, nor what could be causing this. Do they run fine in other PRs?

This happens frequently in the azure pipelines and a lot of effort is going into figuring out why. It is very annoying.

under numpy 1.20.3 - that line is only executed on 1.20+, maybe a slight version skew?

If the check should be made more specific, a patch would be welcome.

burnpanck · 2021-12-19T11:00:26Z

This happens frequently in the azure pipelines and a lot of effort is going into figuring out why. It is very annoying.

Why is it that I get this experience so often with Microsoft products? ("Something failed, but I won't tell you what - go annoy your administrator with that! You are the administrator? Not my problem.")

If the check should be made more specific, a patch would be welcome.

I'm not sure - I do not know what the expected behaviour was before 1.20 and if I see that behaviour or a third, unrelated one.

jreback · 2021-12-19T23:30:26Z

lgtm @jbrockmendel ok here?

jbrockmendel · 2021-12-20T16:24:24Z

no complaints here

jreback · 2021-12-20T18:55:48Z

thanks @burnpanck very nice!

znicholls and others added 14 commits November 26, 2021 12:31

Add failing test

3b8ebf0

Add __array__ method to mock numpy-like

764a44e

TST: GH35131 Add failing test of numpy-like array handling

dabcf62

ENH: GH35131 Implement fix which allows numpy-like handling

c0ce501

Now pd.core.dtypes.inference.is_list_like correctly identifies numpy-like scalars as not being iterable

Simplify ndim check

4ffe216

Co-authored-by: keewis <[email protected]>

Revert change because it broke tests

d340a80

I'm not completely sure why, but reverting here for simplicity

Use slightly clearer logic

ad86e5d

Update to use numpy iterable

9a2eb94

Add failing is_scalar tests

465dd37

Revert to relying on python's shortcircuit operators

b83109e

Also avoid np.iterable

Make a mess

2e08233

Fix missing module

8941c28

rebased to current master; reverted changes to assert_almost_equal

2c58795

fix tests and clarify that a 0-dimensional duck-array is *NOT* a scal…

e7bcee0

…ar (neither are 0-dimensional numpy arrays)

performance short-cuts for np.ndarray and list

c4c348b

jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Nov 26, 2021

jreback added this to the 1.4 milestone Nov 26, 2021

jbrockmendel reviewed Nov 26, 2021

View reviewed changes

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 26, 2021

View reviewed changes

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 26, 2021

View reviewed changes

pandas/tests/dtypes/test_inference.py Outdated Show resolved Hide resolved

changes requested during review by @jbrockmendel

b285293

burnpanck changed the title ~~Issue #35131 Identify zero-dimensional duck arrays as non-iterable~~ ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable Nov 27, 2021

jbrockmendel reviewed Nov 27, 2021

View reviewed changes

pandas/_libs/lib.pyx Show resolved Hide resolved

jreback requested changes Nov 28, 2021

View reviewed changes

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved

burnpanck added 3 commits December 18, 2021 10:57

Merge remote-tracking branch 'upstream/master' into bugfix/fix-35131-…

41af909

…zero-dim-duck-arrays

change requested in review by jreback

99af0bd

made code comments a little bit more explicit

8fff190

Merge branch 'master' into bugfix/fix-35131-zero-dim-duck-arrays

00ce2e0

jreback approved these changes Dec 19, 2021

View reviewed changes

jreback merged commit d228a78 into pandas-dev:master Dec 20, 2021

andrewgsavage mentioned this pull request May 1, 2022

Experimental status? hgrecco/pint-pandas#120

Closed

This was referenced Oct 2, 2022

ENH: allow EA to register types for is_scalar #27462

Open

Pint-Pandas support hgrecco/pint#1599

Closed

andrewgsavage mentioned this pull request Jan 3, 2024

Removing pint-pandas dependency openscm/scmdata#234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

burnpanck commented Nov 26, 2021 •

edited

Loading

burnpanck commented Nov 26, 2021 •

edited

Loading

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

phofl commented Nov 26, 2021

jbrockmendel commented Nov 26, 2021

burnpanck commented Nov 26, 2021

burnpanck commented Nov 28, 2021

jreback left a comment

jbrockmendel commented Nov 28, 2021 •

edited

Loading

burnpanck commented Dec 18, 2021

burnpanck commented Dec 18, 2021

jbrockmendel commented Dec 18, 2021

burnpanck commented Dec 19, 2021

jreback commented Dec 19, 2021

jbrockmendel commented Dec 20, 2021

jreback commented Dec 20, 2021

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

Conversation

burnpanck commented Nov 26, 2021 • edited Loading

burnpanck commented Nov 26, 2021 • edited Loading

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

burnpanck commented Nov 26, 2021

jreback commented Nov 26, 2021

phofl commented Nov 26, 2021

jbrockmendel commented Nov 26, 2021

burnpanck commented Nov 26, 2021

burnpanck commented Nov 28, 2021

jreback left a comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 28, 2021 • edited Loading

burnpanck commented Dec 18, 2021

burnpanck commented Dec 18, 2021

jbrockmendel commented Dec 18, 2021

burnpanck commented Dec 19, 2021

jreback commented Dec 19, 2021

jbrockmendel commented Dec 20, 2021

jreback commented Dec 20, 2021

burnpanck commented Nov 26, 2021 •

edited

Loading

burnpanck commented Nov 26, 2021 •

edited

Loading

jbrockmendel commented Nov 28, 2021 •

edited

Loading