-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoW: add warning mode for cases that will change behaviour #55428
Conversation
There are still a whole bunch of failures (with the warning mode enabled, for which I still need to add a CI build), but quite some of them are due to inplace operators (eg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first part of the review (wifi connection on the train isn't stable)
pandas/core/internals/managers.py
Outdated
if cow_context == "chained-assignment": | ||
warnings.warn( | ||
"ChainedAssignmentError: behaviour will change in pandas 3.0 " | ||
"with Copy-on-Write ...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a doc page that we can link to that explains this more in-depth
pandas/core/series.py
Outdated
warnings.warn( | ||
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2 | ||
) | ||
elif warn_copy_on_write() and sys.getrefcount(self) <= 3: | ||
cow_context = "chained-assignment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to warn here instead of deferring. This might now warn, if the first op in chained assignment created a copy, e.g.
pd.options.mode.copy_on_write = "warn"
df = pd.DataFrame({"a": [1, 2, 3,], "b": 1})
indexer = df.a > 1
df = df.a
df[indexer][0] = 5
I think you could make a case for not warning here since the current behaviour is a no-op as well, but
- the DataFrame case warns
- I would prefer a warning since cow will also warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason that I didn't yet warn here and deferred it to Manager.setitem_inplace
, is because I wanted the warning to be more specific, so we can distinguish the case of:
df = ..
ser = df["col"]
ser[0] = 0
-> here we want to warn that you are setting into a series that is a view of another, and that this change will no longer propagate (in this case no longer update the dataframe)
df = ...
df["col"][0] = 0
-> here we want to warn that this simply won't work anymore in the future, as this is chained assignment (and the case of it that currently does work and is used widely I think, so I want to have a distinct warning for this one)
Now, while writing this I am not sure anymore why this alone warrants deferring it. I have to check in the tests for the case that made me move this to the lower level (I originally was just warning here, as you suggest, but then changed it)
warn = SettingWithCopyWarning if single_block else None | ||
# TODO | ||
warn = ( | ||
SettingWithCopyWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, shouldn't this be silent when CoW is enabled as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that using_copy_on_Write
case is already handled in the if
block above, so in this else
I still need to handle the warn_copy_on_write
case.
I could actually do a elif warn_copy_on_write
separate block, that might be easier to read and which is also the pattern I started to use in many other places.
d3f53c0
to
e16abb4
Compare
@phofl I removed the last commit about the chained assignments again, and moved that to a separate PR (#55522). So this PR then just focuses on adding the infrastructure, and the warnings for (non-chained) series setitem (the things going through |
COW_WARNING_SETITEM_MSG = """\ | ||
Setting a value on a view: behaviour will change in pandas 3.0. | ||
Currently, the mutation will also have effect on the object that shares data | ||
with this object. For example, when setting a value in a Series that was | ||
extracted from a column of a DataFrame, that DataFrame will also be updated: | ||
|
||
ser = df["col"] | ||
ser[0] = 0 <--- in pandas 2, this also updates `df` | ||
|
||
In pandas 3.0 (with Copy-on-Write), updating one Series/DataFrame will never | ||
modify another, and thus in the example above, `df` will not be changed. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is: how elaborate do we want to make this warning message?
One the one hand, I think one can argue that it can be short and point to a doc page where it is explained in more detail (which I actually should add anyway).
On the other hand, I also want it to be understandable for the many people that won't read the docs about this in detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would anticipate users also wanting to remove/address this warning, so in light of that, I would opt to move the explanation to a doc page if we can include potential actions users can take to remove this warning (e.g. set the copy-on-write mode to True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to leave this as is, for now, and do a separate PR to write the documentation and update the warnings (add links to the docs, potentially shorten it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure thing 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the doc page
if warn_copy_on_write: | ||
with tm.assert_produces_warning(FutureWarning): | ||
s[0] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important do we find it that we ensure the exact message? (given that this is only temporary, maybe not super important?)
(although I suppose it is not difficult to just add match="Setting a value on a view"
to all cases I am adding in this PR.
I don't know how useful it will be to add a custom tm.assert_produces_cow_warning
, but it would make the tm.assert_produces_warning(FutureWarning, match="Setting a value on a view"):
a bit shorter.
But when adding warnings for more cases, the exact warning message might also differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay being lax with matching the warning message during testing (but noting a comment that it's CoW related if it's not obvious from the test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but noting a comment that it's CoW related if it's not obvious from the test
That might be a reason to actually make something like a tm.assert_produces_cow_warning
, then it's also clear without adding a comment everywhere, and will be easier to remove all of those.
I added a @phofl @mroeschke if possible, I would like to get this merged shortly if possible, as there is a lot more work to do here, but that will be easier done in parallel once this base PR is in. |
I’ll check later today or tomorrow |
xref #56019, #48998
An initial PR working towards one of the remaining TODO items:
The current PR adds an option
pd.options.mode.copy_on_write = "warn"
, and then everything behaves as the current default (without CoW enabled) but will raise a warning in case the behaviour will change if CoW would be enabled.As a first step, this PR adds the option, and adds the warning in one specific code path that can trigger CoW, i.e. updating a Series (
Manager.setitem_inplace
). So this covers a typical case like:no longer updating
df
.In addition, I also turned off SettingWithCopyWarning when the CoW warning mode is enabled (because in that case, those warnings are no longer correct for the future behaviour, and it seemed better to not raise both kinds of warnings at the same time)