-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/API: test column indexing copy/view semantics #35906
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -706,6 +706,15 @@ def test_iloc_setitem_categorical_updates_inplace(self): | |
expected = pd.Categorical(["C", "B", "A"]) | ||
tm.assert_categorical_equal(cat, expected) | ||
|
||
# __setitem__ under the other hand does not work in-place | ||
cat = pd.Categorical(["A", "B", "C"]) | ||
df = pd.DataFrame({1: cat, 2: [1, 2, 3]}) | ||
|
||
df[1] = cat[::-1] | ||
|
||
expected = pd.Categorical(["A", "B", "C"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also check whether the array that ends up in |
||
tm.assert_categorical_equal(cat, expected) | ||
|
||
def test_iloc_with_boolean_operation(self): | ||
# GH 20627 | ||
result = DataFrame([[0, 1], [2, 3], [4, 5], [6, np.nan]]) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1110,3 +1110,77 @@ def test_setitem_categorical(): | |
{"h": pd.Categorical(["m", "n"]).reorder_categories(["n", "m"])} | ||
) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
|
||
def test_setitem_EA_column_update(): | ||
# https://github.com/pandas-dev/pandas/issues/33457 | ||
|
||
df = pd.DataFrame( | ||
{ | ||
"int": [1, 2, 3], | ||
"int2": [3, 4, 5], | ||
"float": [0.1, 0.2, 0.3], | ||
"EA": pd.array([1, 2, None], dtype="Int64"), | ||
} | ||
) | ||
original_arr = df.EA.array | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you use bracket notation instead of dot? |
||
|
||
# overwrite column with new array | ||
df["EA"] = pd.array([1, 2, 3], dtype="Int64") | ||
# ensure original array was not modified | ||
assert original_arr is not df.EA.array | ||
expected = pd.array([1, 2, None], dtype="Int64") | ||
tm.assert_extension_array_equal(original_arr, expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also check whether df["EA"] views the same data on the RHS of L1129 vs a copy |
||
|
||
|
||
def test_getitem_EA_no_copy(): | ||
# ensure we don't copy the EA when taking a subset | ||
|
||
df = pd.DataFrame( | ||
{ | ||
"int": [1, 2, 3], | ||
"int2": [3, 4, 5], | ||
"float": [0.1, 0.2, 0.3], | ||
"EA": pd.array([1, 2, None], dtype="Int64"), | ||
} | ||
) | ||
original_arr = df.EA.array | ||
subset = df[["int", "EA"]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe parametrizing over several equivalent forms of doing this selection? |
||
assert subset.EA.array is original_arr | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On pandas 1.0 and older, this was apparently taking a copy of the EA (we noticed this in a geopandas test where we were testing against multiple pandas versions) |
||
# check that they view the same data by modifying | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should make a helper for this |
||
df["EA"].array[0] = 10 | ||
expected = pd.array([10, 2, None], dtype="Int64") | ||
tm.assert_extension_array_equal(subset["EA"].array, expected) | ||
|
||
# TODO this way it doesn't modify subset - is this expected? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unsure here. The (or an) issue is that we allow setting to change the dtype, which for EAs means changing the type. So there's no way for s = pd.Series(pd.array([1, 2]))
s.iloc[0] = 0.5 to mutate the array inplace, and so I wouldn't expect your There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That example currently raises an error, because mutating inplace for EAs doesn't allow changing the dtype (or at least for the nullable EAs, and which is a difference with normal dtypes that we should actually decide if this is desired (I think so) and if so, test and document as a difference) |
||
# df.iloc[0, 3] = 10 | ||
# expected = pd.array([10, 2, None], dtype="Int64") | ||
# tm.assert_extension_array_equal(subset['EA'].array, expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can u make this a separate test and xfail (agree we should prob allow this) |
||
|
||
|
||
def test_getitem_column_view(): | ||
# test that getting a single column is a view on the data | ||
|
||
df = pd.DataFrame( | ||
{ | ||
"int": [1, 2, 3], | ||
"int2": [3, 4, 5], | ||
"float": [0.1, 0.2, 0.3], | ||
"EA": pd.array([1, 2, None], dtype="Int64"), | ||
} | ||
) | ||
|
||
# getitem with ExtensionArray | ||
original_arr = df._mgr.blocks[2].values | ||
col = df["EA"] | ||
assert col.array is original_arr | ||
col[0] = 10 | ||
expected = pd.array([10, 2, None], dtype="Int64") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this mutates the original? can u assert that as well |
||
tm.assert_extension_array_equal(df["EA"].array, expected) | ||
|
||
# getitem from consolidated block | ||
col = df["int"] | ||
with pd.option_context("chained_assignment", "warn"): | ||
with tm.assert_produces_warning(pd.core.common.SettingWithCopyWarning): | ||
col[0] = 10 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This highlights a difference between a column backed by ExtensionBlock or consolidated block -> with EA, we don't raise a SettingWithCopyWarning for this case. (I am personally actually fine with how it behaves for EA) |
||
tm.assert_equal(df["int"], pd.Series([10, 2, 3], name="int")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make this another test?