-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add copy-on-write to DataFrame.drop
#49689
ENH: Add copy-on-write to DataFrame.drop
#49689
Conversation
Rows generally can't be dropped without making a copy, so it's OK to focus here on dropping columns. |
Still confused what happening internally. For example: def test_reindex_columns(using_copy_on_write):
# Case: reindexing the column returns a new dataframe
# + afterwards modifying the result
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]})
df_orig = df.copy()
df2 = df.reindex(columns=["a", "c"], copy=True) #<- This should return a copy even if using CoW?
if using_copy_on_write:
# still shares memory (df2 is a shallow copy)
assert np.shares_memory(get_array(df2, "a"), get_array(df, "a"))
else:
assert not np.shares_memory(get_array(df2, "a"), get_array(df, "a"))
# mutating df2 triggers a copy-on-write for that column
df2.iloc[0, 0] = 0
assert not np.shares_memory(get_array(df2, "a"), get_array(df, "a"))
if using_copy_on_write:
assert np.shares_memory(get_array(df2, "c"), get_array(df, "c"))
tm.assert_frame_equal(df, df_orig) The copy parameter is passed down to |
Indeed, looking a bit in more detail at the current pandas/pandas/core/internals/managers.py Lines 730 to 737 in 12ff4f4
For the rest of the So to summarize: in hindsight this already worked the way we want it (so that your test was already passing is expected), but it is of course still good to explicitly test this! |
That's a good question, and something we have to decide in general what we want to do with those methods that already have a If we want to support that here, this would require some changes in (and to be clear, this is relevant for the UPDATE: opened a dedicated issue for this: #50535 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Since the test is related to the reindex one, could you move it to just below test_reindex_columns
?
@@ -52,6 +52,24 @@ def test_copy_shallow(using_copy_on_write): | |||
# DataFrame methods returning new DataFrame using shallow copy | |||
|
|||
|
|||
def test_drop_on_column(using_copy_on_write): | |||
df = DataFrame( | |||
{"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]}, index=[10, 11, 12] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]}, index=[10, 11, 12] | |
{"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]} |
(I used a custom index for the reset_index test, but for the drop test, a custom index shouldn't matter)
I've moved the test and deleted the index. Also added an explicit check that an eager copy is made when not using CoW. Thank you for taking the time to give an in-depth reply and for your guidance in general. The internals are a bit clearer to me now. Will probably try implementing CoW on a few other methods and then maybe diving into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
Progress towards #49473
Add copy-on-write support to
df.drop
.Not sure if the test is checking the correct behavior. It was passing even before I set
copy=None
, butBlockManager.reindex_indexer()
hascopy=True
by default. So my assumption is that there's an error in the test itself.Also test only tries dropping a column. Wasn't sure how to write a test for dropping a row since there's no
get_array
equivalent for that.@jorisvandenbossche Please advise when you have a moment.