Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow storing ExtensionArrays in Index #43930

Merged
merged 110 commits into from
Dec 31, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
df9c228
ENH/WIP/POC: EA-backed Index
jbrockmendel Oct 8, 2021
3952027
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 8, 2021
95e0129
BUG: NumericIndex.insert
jbrockmendel Oct 8, 2021
c52d459
Merge branch 'bug-insert' into enh-nullable-index
jbrockmendel Oct 8, 2021
cf0c171
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 9, 2021
0a3b7d7
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 9, 2021
d53377d
fix a few more tests; ignoring linting for now
jbrockmendel Oct 10, 2021
69fb0bd
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 10, 2021
1952cd7
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 11, 2021
1ed588a
fix test
jbrockmendel Oct 11, 2021
34d5dde
down to 38 tests failing
jbrockmendel Oct 11, 2021
42be4e6
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 11, 2021
91b3716
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 12, 2021
544d9fe
down to 15 tests failing
jbrockmendel Oct 13, 2021
e14d6f1
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 13, 2021
22a0939
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 13, 2021
900978c
fix value_counts
jbrockmendel Oct 16, 2021
f9c8791
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 16, 2021
c0ae18c
fix map test
jbrockmendel Oct 18, 2021
4ab7f0d
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 20, 2021
a9ef37e
fix some tests
jbrockmendel Oct 21, 2021
a2a8de9
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 21, 2021
41acf3f
ENH: ExtensionArray.insert
jbrockmendel Oct 22, 2021
37d36ad
Fix usage
jbrockmendel Oct 22, 2021
bafb23f
Fix TimedeltaIndex.insert test
jbrockmendel Oct 22, 2021
bf76950
Merge branch 'enh-ea-insert' into enh-nullable-index
jbrockmendel Oct 22, 2021
a3a349d
pass a few more tests
jbrockmendel Oct 23, 2021
ebbb7a4
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 23, 2021
4229dbf
tests
jbrockmendel Oct 23, 2021
17dedee
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 23, 2021
45fdffb
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 24, 2021
2fc4798
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 24, 2021
2e1843a
REF: share ExtensionIndex.insert-> Index.insert
jbrockmendel Oct 24, 2021
1516f66
Merge branch 'bug-ei-inserts' into enh-nullable-index
jbrockmendel Oct 24, 2021
c92bea5
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 25, 2021
6bede7a
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 25, 2021
2bb1dea
handle a few more tests
jbrockmendel Oct 26, 2021
e692899
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 26, 2021
36b6629
update test
jbrockmendel Oct 27, 2021
1ab2df1
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 27, 2021
94376ba
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 28, 2021
1881599
Fix remaining tests
jbrockmendel Oct 29, 2021
bb90bfd
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 29, 2021
1f97325
no-pyarrow-compat
jbrockmendel Oct 29, 2021
076cada
mypy fixups
jbrockmendel Oct 29, 2021
1e8a31f
remove assertion
jbrockmendel Oct 30, 2021
1d076fe
Merge branch 'master' into enh-nullable-index
jbrockmendel Oct 31, 2021
2d5fa6d
restor astype
jbrockmendel Oct 31, 2021
adf3ddb
older numpy compat
jbrockmendel Nov 1, 2021
95be963
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 1, 2021
fd6880e
xfail
jbrockmendel Nov 1, 2021
3d9b9af
mypy fixup
jbrockmendel Nov 1, 2021
31d547c
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 2, 2021
2d75377
lint fixups
jbrockmendel Nov 2, 2021
37b9370
avoid warnings
jbrockmendel Nov 3, 2021
0e56218
avoid FutureWarnings
jbrockmendel Nov 5, 2021
e8987cd
catch RuntimeWarning
jbrockmendel Nov 5, 2021
fef88a7
remove unreachable
jbrockmendel Nov 5, 2021
d33e306
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 11, 2021
41c040d
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 16, 2021
e307963
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 19, 2021
63f26ba
revert no-longer-necessary
jbrockmendel Nov 19, 2021
11d3564
Share ExtensionEngine/NullableEngine methods
jbrockmendel Nov 20, 2021
ca747fc
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 20, 2021
cdb08ca
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 21, 2021
09d8bf1
lint fixup
jbrockmendel Nov 21, 2021
7d783b1
revert no-longer-necessary
jbrockmendel Nov 21, 2021
2553db4
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 22, 2021
7a44a79
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 23, 2021
de64249
remove unnecessary from test_setops
jbrockmendel Nov 23, 2021
1bb2901
suggested edits
jbrockmendel Nov 23, 2021
5a76d5d
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 24, 2021
a779c75
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 25, 2021
23bb325
actually run the new base extension tests for all EAs
jorisvandenbossche Nov 26, 2021
a3fba50
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 30, 2021
4abd60e
update tests
jbrockmendel Nov 30, 2021
a9f7ea9
Merge branch 'master' into enh-nullable-index
jbrockmendel Nov 30, 2021
7f9741e
older np compat
jbrockmendel Nov 30, 2021
c22d6e9
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 1, 2021
40e861b
32bit compat
jbrockmendel Dec 1, 2021
6e79350
simplify, docstring
jbrockmendel Dec 1, 2021
ae6bb0c
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 1, 2021
90366e9
32bit compat
jbrockmendel Dec 1, 2021
c8b1d7d
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 1, 2021
70debb2
Address comments
jbrockmendel Dec 1, 2021
b6a3a47
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 3, 2021
c74a7a7
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 5, 2021
0339e69
simplify
jbrockmendel Dec 5, 2021
e737850
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 5, 2021
96b25aa
dont catch np.float16 too early
jbrockmendel Dec 5, 2021
788eda1
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 7, 2021
812da93
de-xfail
jbrockmendel Dec 7, 2021
e394394
remove edits made extraneous by other PRs
jbrockmendel Dec 7, 2021
7718bb1
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 13, 2021
3e1ec00
suggested edits
jbrockmendel Dec 13, 2021
3596bcf
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 15, 2021
267b1b3
Remove NullableEngine, ExtensionEngine
jbrockmendel Dec 15, 2021
70cad91
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 21, 2021
c8072c5
revert
jbrockmendel Dec 21, 2021
7e0ac18
remove no-longer-necessary
jbrockmendel Dec 21, 2021
80453b4
whatsnew
jbrockmendel Dec 21, 2021
7231a9e
deprecation for SparseArray
jbrockmendel Dec 22, 2021
f78aa0f
share _na_value method
jbrockmendel Dec 22, 2021
453d6ae
mypy fixup, npdev catch warnings
jbrockmendel Dec 22, 2021
8750248
mypy fixup
jbrockmendel Dec 22, 2021
0b01bf9
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 23, 2021
d2e0266
compat for older numpy
jbrockmendel Dec 23, 2021
8daa2dc
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 23, 2021
cf95f32
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 25, 2021
7fba6a2
Merge branch 'master' into enh-nullable-index
jbrockmendel Dec 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,43 @@ be removed in the future, see :ref:`here <whatsnew_140.deprecations.int64_uint64

See :ref:`here <advanced.numericindex>` for more about :class:`NumericIndex`.


.. _whatsnew_140.enhancements.ExtensionIndex:

Index can hold arbitrary ExtensionArrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Until now, passing a custom :class:`ExtensionArray` to ``pd.Index`` would cast the
array to ``object`` dtype. Now :class:`Index` can directly hold arbitrary ExtensionArrays (:issue:`43930`).

*Previous behavior*:

.. ipython:: python

arr = pd.array([1, 2, pd.NA])
idx = pd.Index(arr)

In the old behavior, ``idx`` would be object-dtype:

*Previous behavior*:

.. code-block:: ipython

In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')

With the new behavior, we keep the original dtype:

*New behavior*:

.. ipython:: python

idx

One exception to this is ``SparseArray``, which will continue to cast to numpy
dtype until pandas 2.0. At that point it will retain its dtype like other
ExtensionArrays.

.. _whatsnew_140.enhancements.styler:

Styler
Expand Down
4 changes: 3 additions & 1 deletion pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ from pandas._libs import (
hashtable as _hash,
)

from pandas._libs.lib cimport eq_NA_compat
from pandas._libs.missing cimport (
C_NA as NA,
checknull,
is_matching_na,
)
Expand Down Expand Up @@ -62,7 +64,7 @@ cdef ndarray _get_bool_indexer(ndarray values, object val):
if values.descr.type_num == cnp.NPY_OBJECT:
# i.e. values.dtype == object
if not checknull(val):
indexer = values == val
indexer = eq_NA_compat(values, val)

else:
# We need to check for _matching_ NA values
Expand Down
5 changes: 5 additions & 0 deletions pandas/_libs/lib.pxd
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
from numpy cimport ndarray


cdef bint c_is_list_like(object, bint) except -1

cpdef ndarray eq_NA_compat(ndarray[object] arr, object key)
21 changes: 21 additions & 0 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -3050,6 +3050,27 @@ def is_bool_list(obj: list) -> bool:
return True


cpdef ndarray eq_NA_compat(ndarray[object] arr, object key):
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
jreback marked this conversation as resolved.
Show resolved Hide resolved
"""
Check for `arr == key`, treating all values as not-equal to pd.NA.

key is assumed to have `not isna(key)`
"""
cdef:
ndarray[uint8_t, cast=True] result = np.empty(len(arr), dtype=bool)
Py_ssize_t i
object item

for i in range(len(arr)):
item = arr[i]
if item is C_NA:
result[i] = False
else:
result[i] = item == key

return result


def dtypes_all_equal(list types not None) -> bool:
"""
Faster version for:
Expand Down
6 changes: 3 additions & 3 deletions pandas/_testing/asserters.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,9 +404,9 @@ def _get_ilevel_values(index, level):
# skip exact index checking when `check_categorical` is False
if check_exact and check_categorical:
if not left.equals(right):
diff = (
np.sum((left._values != right._values).astype(int)) * 100.0 / len(left)
)
mismatch = left._values != right._values

diff = np.sum(mismatch.astype(int)) * 100.0 / len(left)
msg = f"{obj} values are different ({np.round(diff, 5)} %)"
raise_assert_detail(obj, msg, left, right)
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
else:
Expand Down
16 changes: 16 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,14 @@
MultiIndex,
)

try:
import pyarrow as pa
except ImportError:
has_pyarrow = False
else:
del pa
has_pyarrow = True

# Until https://github.com/numpy/numpy/issues/19078 is sorted out, just suppress
suppress_npdev_promotion_warning = pytest.mark.filterwarnings(
"ignore:Promotion of numbers and bools:FutureWarning"
Expand Down Expand Up @@ -549,7 +557,15 @@ def _create_mi_with_dt64tz_level():
"mi-with-dt64tz-level": _create_mi_with_dt64tz_level(),
"multi": _create_multiindex(),
"repeats": Index([0, 0, 1, 1, 2, 2]),
"nullable_int": Index(np.arange(100), dtype="Int64"),
"nullable_uint": Index(np.arange(100), dtype="UInt16"),
"nullable_float": Index(np.arange(100), dtype="Float32"),
"nullable_bool": Index(np.arange(100).astype(bool), dtype="boolean"),
"string-python": Index(pd.array(tm.makeStringIndex(100), dtype="string[python]")),
}
if has_pyarrow:
idx = Index(pd.array(tm.makeStringIndex(100), dtype="string[pyarrow]"))
indices_dict["string-pyarrow"] = idx


@pytest.fixture(params=indices_dict.keys())
Expand Down
12 changes: 4 additions & 8 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,10 +714,7 @@ def value_counts(self, dropna: bool = True) -> Series:
data = self._data[~self._mask]
value_counts = Index(data).value_counts()

# TODO(ExtensionIndex)
# if we have allow Index to hold an ExtensionArray
# this is easier
index = value_counts.index._values.astype(object)
index = value_counts.index

# if we want nans, count the mask
if dropna:
Expand All @@ -727,10 +724,9 @@ def value_counts(self, dropna: bool = True) -> Series:
counts[:-1] = value_counts
counts[-1] = self._mask.sum()

index = Index(
np.concatenate([index, np.array([self.dtype.na_value], dtype=object)]),
dtype=object,
)
index = index.insert(len(index), self.dtype.na_value)

index = index.astype(self.dtype)

mask = np.zeros(len(counts), dtype="bool")
counts = IntegerArray(counts, mask)
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,9 @@ def max(self, axis=None, skipna: bool = True, **kwargs) -> Scalar:
def value_counts(self, dropna: bool = True):
from pandas import value_counts

return value_counts(self._ndarray, dropna=dropna).astype("Int64")
result = value_counts(self._ndarray, dropna=dropna).astype("Int64")
result.index = result.index.astype(self.dtype)
return result

def memory_usage(self, deep: bool = False) -> int:
result = self._ndarray.nbytes
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/arrays/string_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,13 @@ def __getitem__(
elif isinstance(item, tuple):
item = unpack_tuple_and_ellipses(item)

# error: Non-overlapping identity check (left operand type:
# "Union[Union[int, integer[Any]], Union[slice, List[int],
# ndarray[Any, Any]]]", right operand type: "ellipsis")
if item is Ellipsis: # type: ignore[comparison-overlap]
# TODO: should be handled by pyarrow?
item = slice(None)

if is_scalar(item) and not is_integer(item):
# e.g. "foo" or 2.5
# exception message copied from numpy
Expand Down Expand Up @@ -615,8 +622,7 @@ def value_counts(self, dropna: bool = True) -> Series:
# No missing values so we can adhere to the interface and return a numpy array.
counts = np.array(counts)

# Index cannot hold ExtensionArrays yet
index = Index(type(self)(values)).astype(object)
index = Index(type(self)(values))

return Series(counts, index=index).astype("Int64")

Expand Down
8 changes: 2 additions & 6 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1325,12 +1325,8 @@ def is_bool_dtype(arr_or_dtype) -> bool:
# now we use the special definition for Index

if isinstance(arr_or_dtype, ABCIndex):

# TODO(jreback)
# we don't have a boolean Index class
# so its object, we need to infer to
# guess this
return arr_or_dtype.is_object() and arr_or_dtype.inferred_type == "boolean"
# Allow Index[object] that is all-bools or Index["boolean"]
return arr_or_dtype.inferred_type == "boolean"
jreback marked this conversation as resolved.
Show resolved Hide resolved
elif isinstance(dtype, ExtensionDtype):
return getattr(dtype, "_is_boolean", False)

Expand Down
Loading