-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/API: test column indexing copy/view semantics #35906
TST/API: test column indexing copy/view semantics #35906
Conversation
) | ||
original_arr = df.EA.array | ||
subset = df[["int", "EA"]] | ||
assert subset.EA.array is original_arr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On pandas 1.0 and older, this was apparently taking a copy of the EA (we noticed this in a geopandas test where we were testing against multiple pandas versions)
col = df["int"] | ||
with pd.option_context("chained_assignment", "warn"): | ||
with tm.assert_produces_warning(pd.core.common.SettingWithCopyWarning): | ||
col[0] = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights a difference between a column backed by ExtensionBlock or consolidated block -> with EA, we don't raise a SettingWithCopyWarning for this case.
(I am personally actually fine with how it behaves for EA)
I'm generally positive on this PR, lean towards only testing explicitly-desired behavior |
And do you find the currently tested behaviours desired? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistic nit but I think the tests are in the right direction
"EA": pd.array([1, 2, None], dtype="Int64"), | ||
} | ||
) | ||
original_arr = df.EA.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use bracket notation instead of dot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glancing through the tests I think I agree with the expected behavior asserted by them.
expected = pd.array([10, 2, None], dtype="Int64") | ||
tm.assert_extension_array_equal(subset["EA"].array, expected) | ||
|
||
# TODO this way it doesn't modify subset - is this expected? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure here. The (or an) issue is that we allow setting to change the dtype, which for EAs means changing the type. So there's no way for
s = pd.Series(pd.array([1, 2]))
s.iloc[0] = 0.5
to mutate the array inplace, and so I wouldn't expect your subset
to be mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That example currently raises an error, because mutating inplace for EAs doesn't allow changing the dtype (or at least for the nullable EAs, and which is a difference with normal dtypes that we should actually decide if this is desired (I think so) and if so, test and document as a difference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -706,6 +706,15 @@ def test_iloc_setitem_categorical_updates_inplace(self): | |||
expected = pd.Categorical(["C", "B", "A"]) | |||
tm.assert_categorical_equal(cat, expected) | |||
|
|||
# __setitem__ under the other hand does not work in-place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make this another test?
# TODO this way it doesn't modify subset - is this expected? | ||
# df.iloc[0, 3] = 10 | ||
# expected = pd.array([10, 2, None], dtype="Int64") | ||
# tm.assert_extension_array_equal(subset['EA'].array, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u make this a separate test and xfail (agree we should prob allow this)
col = df["EA"] | ||
assert col.array is original_arr | ||
col[0] = 10 | ||
expected = pd.array([10, 2, None], dtype="Int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this mutates the original? can u assert that as well
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
|
||
df[1] = cat[::-1] | ||
|
||
expected = pd.Categorical(["A", "B", "C"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also check whether the array that ends up in df[1]
is the same object on the RHS of L713 (Block.setitem is weird)
# ensure original array was not modified | ||
assert original_arr is not df.EA.array | ||
expected = pd.array([1, 2, None], dtype="Int64") | ||
tm.assert_extension_array_equal(original_arr, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also check whether df["EA"] views the same data on the RHS of L1129 vs a copy
} | ||
) | ||
original_arr = df.EA.array | ||
subset = df[["int", "EA"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe parametrizing over several equivalent forms of doing this selection?
original_arr = df.EA.array | ||
subset = df[["int", "EA"]] | ||
assert subset.EA.array is original_arr | ||
# check that they view the same data by modifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make a helper for this
@jorisvandenbossche can you merge master and address comments |
@jorisvandenbossche closing as stale. reopen when ready. |
This is adding some tests for current behaviour related to the discussions in #33457 and #35417 (and also adds the tests from #35266 which got never merged). I think it is good to have some more explicit tests of current behaviour, while we are considering changing things.
Note: some semantics I am testing are certainly debatable, but I was experimenting a bit with what is happening on current master.