DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

jorisvandenbossche · 2021-02-03T21:49:03Z

Currently, setitem-like operations (i.e. operations that change values in an existing series or dataframe such as __setitem__ and .loc/.iloc setitem, or filling methods like fillna) first try to update in place, but if there is a dtype mismatch, pandas will upcast to a common dtype (typically object dtype).

For example, setting a string into an integer Series upcasts to object:

>>> s = pd.Series([1, 2, 3])
>>> s.loc[1] = "B"
>>> s
0    1
1    B
2    3
dtype: object

or doing a fillna with an invalid fill value also upcasts instead of raising an error:

>>> s = pd.Series(["2020-01-01", "NaT"], dtype="datetime64[ns]")
>>> s
0   2020-01-01
1          NaT
dtype: datetime64[ns]
>>> s.fillna(1)
0    2020-01-01 00:00:00
1                      1
dtype: object

My general proposal would be that in some future (eg pandas 2.0 + after a deprecation), such inherently inplace operation should have the guarantee to either happen in place or either error, and thus never change the dtype of the original Series/DataFrame.

This is similar to eg numpy's behaviour where setitem never changes the dtype. Showing the first example from above in equivalent numpy code:

>>> arr = np.array([1, 2, 3])
>>> arr[1] = "B"
...
ValueError: invalid literal for int() with base 10: 'B'

Apart from that, I also think this is the cleaner behaviour with less surprises. If a user specifically wants to allow mixed types in a column, they can manually cast to object dtype first.

On the other hand, this is quite a big change in how we generally are permissive right now and easily upcast, and such a change will certainly impact quite some user code (but, it's perfectly possible to do this with proper deprecation warnings in advance warning for the specific cases where it will error in the future AFAIK).

There are certainly some more details that need to discussed as well if we want this (which exact values are regarded as compatible with the dtype, eg setting a float in an integer column, should that error or silently round the float?). But what are people's thoughts on the general idea?

cc @pandas-dev/pandas-core

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2021-02-03T21:52:35Z

Sidenote: the extension arrays are actually already more strict on this (which is also needed, otherwise setitem could change the class of the object). But the upcasting logic lives a level higher on the Series/DataFrame, where the underlying array gets swapped when an upcast happens. So in other words, the proposal is to propagate that stricter behaviour of the arrays also to Series/DataFrame.

toobaz · 2021-02-03T22:52:00Z

I tend to agree this is a step to take sooner or later. I don't think I ever met a case in which implicitly upcasting via a setitem on individual elements was a desired feature and not a bug.

Clearly (?), this should not apply to replacing an entire column of a DataFrame (or multiple columns) with new ones. Or if we want to state it more generally: dtype should not change unless the smallest dtype-bearing block (which is the column) is entirely replaced. And just for completeness: it would apply to df.loc[:, 'col'] = s but not to df['col'] = s (notice that the former currently replaces the dtype e.g. if col had previously int dtype and s has Timestamp, something that I suspect should not happen).

jbrockmendel · 2021-02-03T23:31:41Z

I am generally positively disposed towards this idea. Some things that are not obvious: 1) Does this apply to setitem-like ops on Index? In particular, if I add a new column to a DataFrame and doing so would require casting the existing columns, does that raise? 2) Can we get a complete-ish list of what constitutes setitem-like? (e.g. above I'm assuming Index.append counts, but that _cant_ be inplace, so it might reasonably be excluded) 3) Some of the casting is a result of fallback-on-failure, but other pieces are due to downcasting-on-success. (see Block._maybe_downcast and Block.convert; affected methods include where, fillna, interpolate, replace). Is the proposal to change these behaviors too? 4) Because the fallback-on-failure only occurs after a failure, it would be fairly cheap to do a pd.get_option lookup (or an obj.flags lookup) to decide on cast-vs-raise. The major downside would be having to test/support two variants, the upside is that the It Just Works behavior is often really convenient.

…

On Wed, Feb 3, 2021 at 2:52 PM Pietro Battiston ***@***.***> wrote: I tend to agree this is a step to take sooner or later. I don't think I ever met a case in which implicitly upcasting via a setitem on individual elements was a desired feature and not a bug. Clearly (?), this should *not* apply to replacing an entire column of a DataFrame (or multiple columns) with new ones. Or if we want to state it more generally: dtype should not change *unless* the smallest dtype-bearing block (which is the column) is entirely replaced. And just for completeness: it would apply to df.loc[:, 'col'] = s but not to df['col'] = s (notice that the former currently replaces the dtype e.g. if col had previously int dtype and s has Timestamp, something that I suspect should not happen). — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39584 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6DYFKNWBUCZHFEYWX3S5HHSDANCNFSM4XBUHQGA> .

jorisvandenbossche · 2021-02-04T09:06:45Z

Clearly (?), this should not apply to replacing an entire column of a DataFrame (or multiple columns) with new ones.

@toobaz Yes, thanks for explicitly stating that, as I forgot to mention it. Indeed, the proposal is about the cases where (a subset of) the values are changed in place, and not where we are replacing a full column.
So df["existing_col"] = new_values will still follow the dtype of the new values and be able to change the dtype of the column.

And just for completeness: it would apply to df.loc[:, 'col'] = s but not to df['col'] = s (notice that the former currently replaces the dtype e.g. if col had previously int dtype and s has Timestamp, something that I suspect should not happen).

Indeed, that's probably the distinction we want to make. There is some discussion about this in #38896 (comment)

[@jbrockmendel] 1) Does this apply to setitem-like ops on Index? In particular, if I add a
new column to a DataFrame and doing so would require casting the existing
columns, does that raise?

Good point to explicitly call out. I would say: yes, Index and Series should generally be consistent. But note that we don't allow direct (by the user) __setitem__ operations for Index anyway. So then it's mostly about methods like fillna, and there I think it can follow the behaviour we decide for Series (and see comment below for more detailed list of methods that are considered).

To be explicit: this proposal does not cover concat/append methods or set operations (union, intersection, etc), as those class of functions inherently create new objects (with a potentially different shape or order) and follow the upcasting rules (_get_common_dtype / find_common_type based). So operations like reset_index() where a value gets added to the .columns Index object are not affected (as that would indeed be annoying if that would raise).

The case you mention of adding a new column to a DataFrame (df[col] = ...) with a label that isn't compatible with the columns Index.dtype, is handled under the hood as an Index.insert. I would also leave that out from the discussion here, as it's not a "pure" setitem (you can't express this as a setitem operation on the underlying array), and I would rather put it in the bucket of set operations following the upcasting rules.

3) Some of the casting is a result of fallback-on-failure, but other pieces are due to downcasting-on-success. Is the proposal to change these behaviors too?

Let's leave that for a separate issue to discuss, as in theory that's orthogonal in implementation (although certainly related). AFAIK we never do that for actual setitem, but only in a few specific methods in the internals (fillna, interpolate, where, replace).
(I will open another issue for it)

toobaz · 2021-02-04T09:25:08Z

But note that we don't allow direct (by the user) __setitem__ operations for Index anyway. So then it's mostly about methods like fillna, and there I think it can follow the behaviour we decide for Series.

Not sure I follow: Index.fillna does not resemble __setitem__ to me: it does not modify data, rather it creates and returns a copy. I see it more as a sort of arithmetic operation...

jorisvandenbossche · 2021-02-04T09:28:32Z

Can we get a complete-ish list of what constitutes setitem-like? (e.g. above I'm assuming Index.append counts, but that cant be inplace, so it might reasonably be excluded)

The append method is indeed excluded, as I would label that as a concatenation and not a setitem (it also can't be expressed in setitem operations on the array level)

But indeed good to think about a more complete list. After going through the namespace of Series/DataFrame, I think there are basically two groups:

Actual __setitem__ operations (directly called by the user), which includes plain __setitem__ (obj[..] = ..) and loc/iloc/at/iat's setitem
Methods that in theory can happen in place (in practice we mostly still return a new object by default, though, unless you specify explicitly inplace=True). These are methods that can be expressed as a setitem operation (eg fillna() can be expressed as arr[arr.isna()] = fill_value).
I think the more or less full list is:
- fillna and interpolate
- replace
- update
- where, mask
- clip
- (for Index we also have putmask, but this already seems a bit inconsistent in its upcasting behaviour)

So any other method that potentially changes the shape is not included (eg append). Those can also typically not be expressed as setitem equivalent (setitem with arrays cannot expand).
One remark here is that our setitem on DataFrame and Series can expand. Potentially we could think about whether we want to deviate from the rule for such expanding setitem, if desired (but I am not directly arguing for it, you can start with object dtype if you want to expand with arbitrary objects).

For the above list, I would say that direct __setitem__ is all included in the proposal, and for the methods of the second group, we might need to decide on a case by case basis.
For example, my feeling says that for fillna it is logical to preserve the dtype, while for replace this might be less convenient

jorisvandenbossche · 2021-02-04T09:36:55Z

@toobaz does my last comment clarify that? (we could also keep the two groups (actual setitem vs methods) as two separate discussions, if that helps)

toobaz · 2021-02-04T10:59:11Z

@toobaz does my last comment clarify that? (we could also keep the two groups (actual setitem vs methods) as two separate discussions, if that helps)

@jorisvandenbossche it does clarify, although as you pointed out, members of your second group of methods are more heterogeneous in their behavior and hence a general rule of (not) upcasting might be hard to enforce (harder than just saying "please be aware that every time you use inplace=True, pandas might upcast, in the sense that it will behave precisely as inplace=False except that it changes the object you're referencing rather than returning a new reference").

To the extent that we (at least in a first stage) focus on the first group, then, Indexes are excluded from the discussion.

(In general, as long as inplace=True will exist in pandas, I think we'll want to stress it's mostly about syntax, and possibly, but not necessarily, about implementation... which means the last thing we want is the implementation of inplace=False to depend on the possibility of inplace=True)

Dr-Irv · 2021-02-04T14:28:40Z

(In general, as long as inplace=True will exist in pandas, I think we'll want to stress it's mostly about syntax, and possibly, but not necessarily, about implementation... which means the last thing we want is the implementation of inplace=False to depend on the possibility of inplace=True)

This might be heresy, but maybe this proposal should be considered in conjunction with an idea of getting rid of inplace=True as an option (Personally, I always use the default of inplace=False). Discussion in #16529 . And if we were to consider getting rid of inplace=True, then does that change the nature of the list that @jorisvandenbossche provided above?

toobaz · 2021-02-04T15:36:05Z

does that change the nature of the list that @jorisvandenbossche provided above?

I stated my opinion above: if we keep inplace=True, then it should just mimick the inplace=False and hence not infuence our decision here; if we ever drop inplace, then even more so.

jorisvandenbossche · 2021-02-04T22:58:53Z

I also don't think that the discussion on keeping the inplace keyword or not would influence this discussion much (and if we agree on that, probably best to leave it as a separate discussion to keep this one manageable. But I am fully supportive of rethinking the inplace keyword, in #16529 ).

A method like fillna is one of the few methods where inplace=True actually can work. But right now even for those cases you are not sure that inplace=True actually did it inplace without making a copy, because of the upcasting behaviour. So with this proposal (if we include fillna in it), the situation improves a bit since inplace=True can now actually be guaranteed to be inplace (and the same will be true for any alternative for inplace we might want, like copy=False). That's a nice side effect, but for me personally not the main driver to argue for it. I mainly would prefer that fillna has predictable behaviour with a dtype that gets preserved (also for the default case of returning a new object).

jbrockmendel · 2021-02-12T21:17:46Z

This was discussed on the call on Wednesday. @jorisvandenbossche would you like to summarize the conclusion and we'll make sure we're all on the same page (again we forgot to write it down in real-time)

jbrockmendel · 2021-02-12T23:15:56Z

An attempt at enumerating the various setitem-like methods, with some granularity into degree of setitemlikeness

* For this discussion, "in-place" refers to writing to an underlying array, which is not equivalent to the inplace keyword

Fully setitem-like methods are those that currently may operate in-inplace:

(DataFrame|Series).__setitem__
(loc|iloc|at|iat).__setitem__
(DataFrame|Series).putmask
(DataFrame|Series).mask

Methods that take a fill_value keyword (or equivalent) and use logic (ideally) identical to setitem to determine casting behavior

(DataFrame|Series|Index).where
(DataFrame|Series|Index).fillna
(DataFrame|Series).shift
(DataFrame|Series).reindex
Index.putmask
Index.insert
DataFrame.unstack
pivot_table <- has a fill_value keyword but i havent actually checked if it is used in the same way

jorisvandenbossche · 2021-02-16T20:23:35Z

(We actually did take some notes about it this time (although a bit terse ;)), and it was on my TODO list to report back here. Thanks for the ping!)

So while briefly discussing this at the meeting last week, there was the useful feedback that also for other (not purely setitem related) operations, pandas will often liberally upcast dtypes, and also for those operations it might be useful to be (or have the option to be) more strict about dtypes.

In addition to the setitem-like cases discussed here (the two bullet points at #39584 (comment) for an overview), two other groups of cases mentioned:

(Reindexing) operations with a fill_value (align, reindex, take (the Index/internal version), shift, ... also eg unstack indirectly), as mentioned in the comment just above by @jbrockmendel as well
- Currently when fill_value doesn't match the existing dtype, we upcast to the common (often object) dtype
- This is of course quite related to fillna, which also has a fill_value. But I think fillna is the only one that is a pure setitem, while the others here are always a combination of take + setitem. Of course, that doesn't mean we should discuss them separately (but that's the reason I didn't include them above)
- Also the flex operator methods (DataFrame.add(..) etc, and DataFrame.combine) have a fill_value, although in this case you can use it both for filling original missing values as missing values introduced from alignment.
- I suppose that for those operations with a fill_value, we probably want to follow the same behaviour as for setitem / fillna.
Concat operations that combine dtypes
- Currently we never raise an error and will combine columns with different dtypes silently upcasting to the common (often object) dtype.
  But it could be useful to have a "stricter" concat, where columns that get combined are required to have a "compatible" data type (to not get silent upcasts to object dtype).
- It might be useful to have different "levels" of compatility between data types (eg casting int32/int64 to int64 might be fine while casting int64/datetime64 to object not, for a certain use case), i.e. a kind of casting levels.

The "concat" case is quite different and certainly deserves a separate discussion (it's on my TODO list to open one, but if someone is interested in this, feel free to do it before me).
Should we keep the fill_value cases also for a separate issue on focus here on the actual indexing __setitem__ cases?

jorisvandenbossche · 2021-02-16T20:29:16Z

@jbrockmendel few specific things about your list above (#39584 (comment)): where doesn't take a fill_value (so it's like mask in the first list); fillna also fits in the "may operate in-inplace" category of the first list (but also has a fill_value like the other items of the second list of course, so it can be in both); I don't think a (DataFrame|Series).putmask exists (I only find it for Index?). The Index.insert I would leave out of this discussion, see my comment about it above #39584 (comment) (it also doesn't have a fill_value keyword, although you insert a value of course).

jbrockmendel · 2021-02-17T17:57:43Z

where doesn't take a fill_value (so it's like mask in the first list)

It doesn't take fill_value, but the other it does take can be a scalar and behaves the same way.

I don't think a (DataFrame|Series).putmask exist

There is a Series.mask, but you're right about putmask in particular. (Series|DataFrame)__setitem__ with a mask key ends up going through Block.putmask

The Index.insert I would leave out of this discussion

ill defer to you on how to keep the discussion narrow. my focus ATM is on trying to go from many implementations of casting logic to just one.

jorisvandenbossche · 2021-02-17T22:04:38Z

ill defer to you on how to keep the discussion narrow. my focus ATM is on trying to go from many implementations of casting logic to just one.

I don't think we can/want have only one. For example, at least the casting logic for setitem and for concat could be different.
And in that context, I think insert rather belongs in the "concat" bucket, as it expands the index object, similar to an append.

where doesn't take a fill_value (so it's like mask in the first list)

It doesn't take fill_value, but the other it does take can be a scalar and behaves the same way.

The same is true for mask, though, so just pointing out that where and mask belong together I think.
(and in that sense, also __setitem__ uses a fill_value, need to draw the line somewhere ;) or not draw the line and just treat them all as the same group).

We can have a lot of discussions about which method to include in which group, but so I would propose to start with the actual __setitem__ ones (so the user doing assignment with an indexing operation): what exact casting rules do we want for setitem indexing?
And from there, we can afterwards expand it to see which other methods can/should follow those setitem casting logic.

jbrockmendel · 2021-10-07T20:50:16Z

@jorisvandenbossche this was on the agenda for the Feb 2021 call but i don't see any notes about the discussion. I have a vague recollection that we discussed a less-strict (not mutually-exclusive) restriction that would prevent silent casting to object, but allow e.g. int->float. Do you have any memory or opinion on this?

jorisvandenbossche · 2021-10-12T06:57:59Z

The notes are basically above (#39584 (comment), and your and my comment below that). Now, I don't really remember something about a less-strict casting (except that we basically have this right now for concat (through "common_dtype"), although there you also still get a silent cast to object dtype at the moment if there is no common dtype). Do you remember for what kind of operation we might have talked about this?

jbrockmendel · 2021-10-12T17:07:56Z

Do you remember for what kind of operation we might have talked about this?

I think there was discussion about __setitem__-like operations and fill_value keywords that would go through maybe_promote. Can focus on the __setitem__s to keep the conversation focused as you've suggested above.

For the most part __setitem__s that do casting are either 1) numeric upcasting e.g. setting floats into an integer series/column or 2) upcasting to object. In principal there could be others (e.g. IntervalDtype[int64] -> IntervalDtype[float64]), but de facto these two cover the vast majority of cases.

The OP suggestion is to deprecate allowing both 1) and 2). The less-strict suggestion is to deprecate only 2), without making a decision about deprecating 1). The benefits as I see them are

a) object-casting cases are the most likely to be accidental, e.g. with dt64 a typo "2016-01-01" -> "2p16-01-01".
b) makes the non-EA behavior almost match the EA behavior.

bashtage · 2021-10-12T17:18:27Z

I find the rules to be a bit inconsistent. For example

import pandas as pd
a = pd.Series([1,2,3],dtype="i4")
a.iloc[0] = 2**33 + 1.1  # OK
b = pd.Series([1,2,3],dtype="i4")
b.iloc[0] = 2**33 + 1.0  # raises because it is cast to int before setting

Similarly

c = pd.Series([1,2,3],dtype="i4")
c.iloc[0] = 2**62   # raises rather than upcasts to int64

jorisvandenbossche · 2021-10-12T21:59:19Z

@bashtage what version of pandas are you using? For me, all those example silently pass (and in the case of b it seems to silently wraparound the while converting 2**33 + 1.0 to an int32):

In [8]: b = pd.Series([1,2,3], dtype="i4")

In [9]: b
Out[9]: 
0    1
1    2
2    3
dtype: int32

In [10]: b.iloc[0] = 2**33 + 1.0

In [11]: b
Out[11]: 
0    1
1    2
2    3
dtype: int32

In [12]: b.iloc[0] = 2**33 + 4.0

In [13]: b
Out[13]: 
0    4
1    2
2    3
dtype: int32

In [14]: pd.__version__
Out[14]: '1.3.3'

(which seems like a bug)

bashtage · 2021-10-12T22:14:07Z

@jorisvandenbossche This was run against master on Windows. It is probably unrelated but I wanted to use master since I had a number of problems with future changes like NumericIndex, especially on Windows where the default dtype is often int32

jbrockmendel · 2021-12-22T18:46:53Z

Marking as blocker for 1.4rc; we should make a decision about deprecation before then.

MarcoGorelli · 2022-12-22T11:20:18Z

@jorisvandenbossche for fillna, should this (for now) only be done when inplace=True?

If not, then I think the scope would also need expanding to where, as that would also end up in

pandas/pandas/core/internals/blocks.py

Lines 1429 to 1450 in e38daf0

    
           try: 
        
               res_values = arr._where(cond, other).T 
        
           except (ValueError, TypeError) as err: 
        
               _catch_deprecated_value_error(err) 
        
               if self.ndim == 1 or self.shape[0] == 1: 
        
                   if is_interval_dtype(self.dtype): 
        
                       # TestSetitemFloatIntervalWithIntIntervalValues 
        
                       blk = self.coerce_to_target_dtype(orig_other) 
        
                       nbs = blk.where(orig_other, orig_cond) 
        
                       return self._maybe_downcast(nbs, downcast=_downcast) 
        
                   elif isinstance(self, NDArrayBackedExtensionBlock): 
        
                       # NB: not (yet) the same as 
        
                       #  isinstance(values, NDArrayBackedExtensionArray) 
        
                       blk = self.coerce_to_target_dtype(orig_other) 
        
                       nbs = blk.where(orig_other, orig_cond) 
        
                       return self._maybe_downcast(nbs, downcast=_downcast) 
        
                   else: 
        
                       raise

e.g.:

ser = pd.Series(period_array(["2000", "2001", "NaT"], freq="D"))
cond = np.array([True, False, True])
other = pd.Period("2000", freq="H")
print(ser.where(cond, other))

This would be a quite noticeably change, and from the last dev call, there wasn't a consensus on whether this should be done everywhere or whether upcasting should be allowed in certain cases, like int to float. So I do think this warrants a PDEP if we want a decision to be reached

UlrichKreidenweis · 2023-10-05T13:23:15Z

I recently upgraded to pandas 2.11 and am now trying to solve the hundreds of FutureWarnings in my code.

However, I find this new behavior of not changing automatically from integer to float pretty hard to internalize and not very consistent.

In general, when dealing with all-integer DataFrames I find it confusing that changing the column will work, while changing the row will not. So far, I have seen it as a main advantage of pandas (also compared to data.frame in R) that I usually don't have to worry about datatypes too much.

import pandas as pd
df = pd.DataFrame({"X": [1, 2], "Y":[3, 4]}, index=["A", "B"])

df.loc[:,"Y"] += 0.1  # allowed
df.loc["A",:] += 0.1  # raises FutureWarning

What I find especially counterintuitive is that now the order of execution matters more often.
For instance, if I change all entries of a column first, it is ok:

df.loc[:,"Y"] += 0.1  # allowed, and changes "Y" to float64
df.loc["A","Y"] += 0.5  # allowed

If I do it the other way round, if will fail:

df.loc["A","Y"] += 0.5  # raises FutureWarning
df.loc[:,"Y"] += 0.1

I guess the discussion on this topic has ended, but personally I would very much like to get the old behavior back, where integer columns automatically change to float when necessary.

MarcoGorelli · 2023-10-05T13:37:49Z

thanks @UlrichKreidenweis for the report

The general rule is that the deprecation doesn't apply if you're replacing the entire column

In the case of df.loc[:,"Y"], you're replacing the entire column, so that's allowed

In the case of df.loc['A', :], that returns a int64 Series which (I think) is a view of the original dataframe - so, updating that requires updating some elements of the original dataframe in place (changing columns' dtypes, and its these setitem-like in place operations which PDEP6 affects)

So, to me this looks like it's working as expected

As for avoiding the warning, you could not do the operation in place, e.g.:

In [56]: pd.concat([df[col].where(df.index!='A', df[col]+.1) for col in df.columns], axis=1)
Out[56]:
     X    Y
A  1.1  3.1
B  2.0  4.0

Or just cast everything to float to begin with

UlrichKreidenweis · 2023-10-05T14:23:35Z

@MarcoGorelli many thanks for the quick reply.

I think I understand the current behavior and agree that it's probably working as intended. All I was trying to say was that this change has not improved my personal user experience.
In the future I will probably more often create DataFrames with dtype float, even if integers would have been more appropriate, just to avoid having to solve this issue.

The only slight inconsistency that I have found so far is that when changing the whole column, but by naming all indexes this also raises a warning:

import pandas as pd
df = pd.DataFrame({"X": [1, 2], "Y":[3, 4]}, index=["A", "B"])
df.loc[["A", "B"],"Y"] += 0.1
# or like this
df.loc[df.index, "Y"] += 0.1

MarcoGorelli · 2023-10-06T09:01:49Z

From some discussion with @phofl and @jbrockmendel , it seems that

df.loc[:,"Y"] += 0.1

should indeed be warning as well, as that it also inplace

will take a look, and thanks again for the report

MarcoGorelli · 2023-10-06T09:05:41Z

@phofl looks like we get here:

pandas/pandas/core/frame.py

Lines 4165 to 4166 in 6c58a21

    
           arraylike, refs = self._sanitize_column(value) 
        
           self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)

which has inplace=False specifically there - is this right? If that's the case, then it looks correct that it's not warning?

MarcoGorelli · 2023-10-16T17:18:17Z

Discussion today with @phofl: in theory, df.loc[:, 'a'] += .1 shouldn't modify df once copy-on-write is enabled, so it's OK for it to not warn here

But, even enabling copy-on-write at the moment, df is in fact still modified:

(.venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ cat t.py
import pandas as pd
pd.options.mode.copy_on_write = True

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

df.loc[:, 'a'] += .1

print(df)

(.venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ python t.py
     a  b
0  1.1  4
1  2.1  5
2  3.1  6

so this needs further investigation in any case

rhshadrach · 2023-10-26T21:09:19Z

It seems quite odd to me that df.loc[:, 'a'] += .1 should not modify df but df.loc[:, 'a'] = .1 should. If this "bug" is isolated (perhaps only impacts the inplace-dunders?) and doesn't have other downsides, I think we should consider keeping it as a "feature".

In any case, is most of this issue handled by PDEP-6? I would suggest creating new issues for implementation issues if that is the case.

MarcoGorelli · 2023-11-10T12:16:54Z

yeah I don't think the comment that it should be a no-op is correct

anyway, agree - let's close and discuss the issue in #55791, which seems to be the same thing, thanks all!

jorisvandenbossche added API Design Needs Discussion Requires discussion from core team before further action labels Feb 3, 2021

jorisvandenbossche mentioned this issue Feb 4, 2021

WIP [ArrayManager] API: setitem to set new columns / loc+iloc to update inplace #39578

Closed

jorisvandenbossche mentioned this issue Feb 8, 2021

[ArrayManager] BUG: fix setitem with non-aligned boolean dataframe #39539

Closed

jorisvandenbossche mentioned this issue Aug 10, 2021

API: astype mechanism for extension arrays #22384

Open

mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Aug 15, 2021

jorisvandenbossche mentioned this issue Oct 12, 2021

[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

Open

phofl mentioned this issue Nov 7, 2021

BUG: pd.NA when it replaces a value in a column, changes its type to "object" #44199

Closed

3 tasks

mzeitlin11 mentioned this issue Nov 10, 2021

Require the dtype of SparseArray.fill_value and sp_values.dtype to match #23124

Closed

This was referenced Dec 20, 2021

RLS: 1.4 #41957

Closed

Allow dtype promotion in Series[ExtensionArray].__setitem__? #24020

Closed

jbrockmendel added the Blocker for rc Blocking issue or pull request for release candidate label Dec 22, 2021

LucasG0 mentioned this issue Dec 25, 2021

BUG: Series.__setitem__ fails with non range index while upcasting dtype #45070

Closed

3 tasks

jbrockmendel removed the Blocker for rc Blocking issue or pull request for release candidate label Dec 30, 2021

jbrockmendel mentioned this issue Jan 21, 2022

API: Series[EA].fillna fallback behavior with incompatible value #45153

Open

jbrockmendel mentioned this issue Jan 31, 2022

BUG: TypeError when attempting to replace Nullable integer data type with a float value #45729

Closed

3 tasks

roib20 mentioned this issue Feb 1, 2022

BUG: Replacing pd.NA by None has no effect #45601

Closed

3 tasks

TomAugspurger mentioned this issue Jul 5, 2022

Ensure integer dtypes are aligned before shuffle-based merge dask/dask#9236

Open

3 tasks

jorisvandenbossche mentioned this issue Jul 12, 2022

BUG: Setting incompatible values into ea column raises instead of casting to object #47577

Closed

3 tasks

MarcoGorelli mentioned this issue Dec 24, 2022

PDEP-6: Ban upcasting in setitem-like operations #50424

Merged

MichaelTiemannOSC mentioned this issue Aug 7, 2023

ENH: enable setitem dim2 test to work for EA with complex128 dtype #54445

Open

3 tasks

MarcoGorelli mentioned this issue Oct 16, 2023

DEPR raise PDEP-6 warning in iadd operation with incompatible dtype #55543

Closed

MarcoGorelli closed this as completed Nov 10, 2023

MarcoGorelli mentioned this issue Dec 4, 2023

BUG raise pdep6 warning for loc full setter #56146

Merged

5 tasks

mroeschke mentioned this issue Aug 27, 2024

BUG(?): Allow setitem to category/sparse of the same underlying dtype? #59627

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

jorisvandenbossche commented Feb 3, 2021

jorisvandenbossche commented Feb 3, 2021

toobaz commented Feb 3, 2021

jbrockmendel commented Feb 3, 2021 via email

jorisvandenbossche commented Feb 4, 2021 •

edited

Loading

toobaz commented Feb 4, 2021

jorisvandenbossche commented Feb 4, 2021 •

edited

Loading

jorisvandenbossche commented Feb 4, 2021

toobaz commented Feb 4, 2021 •

edited

Loading

Dr-Irv commented Feb 4, 2021

toobaz commented Feb 4, 2021

jorisvandenbossche commented Feb 4, 2021

jbrockmendel commented Feb 12, 2021

jbrockmendel commented Feb 12, 2021

jorisvandenbossche commented Feb 16, 2021

jorisvandenbossche commented Feb 16, 2021

jbrockmendel commented Feb 17, 2021

jorisvandenbossche commented Feb 17, 2021

jbrockmendel commented Oct 7, 2021

jorisvandenbossche commented Oct 12, 2021

jbrockmendel commented Oct 12, 2021

bashtage commented Oct 12, 2021

jorisvandenbossche commented Oct 12, 2021

bashtage commented Oct 12, 2021 •

edited

Loading

jbrockmendel commented Dec 22, 2021

MarcoGorelli commented Dec 22, 2022

UlrichKreidenweis commented Oct 5, 2023

MarcoGorelli commented Oct 5, 2023 •

edited

Loading

UlrichKreidenweis commented Oct 5, 2023

MarcoGorelli commented Oct 6, 2023

MarcoGorelli commented Oct 6, 2023

MarcoGorelli commented Oct 16, 2023

rhshadrach commented Oct 26, 2023

MarcoGorelli commented Nov 10, 2023

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

Comments

jorisvandenbossche commented Feb 3, 2021

jorisvandenbossche commented Feb 3, 2021

toobaz commented Feb 3, 2021

jbrockmendel commented Feb 3, 2021 via email

jorisvandenbossche commented Feb 4, 2021 • edited Loading

toobaz commented Feb 4, 2021

jorisvandenbossche commented Feb 4, 2021 • edited Loading

jorisvandenbossche commented Feb 4, 2021

toobaz commented Feb 4, 2021 • edited Loading

Dr-Irv commented Feb 4, 2021

toobaz commented Feb 4, 2021

jorisvandenbossche commented Feb 4, 2021

jbrockmendel commented Feb 12, 2021

jbrockmendel commented Feb 12, 2021

jorisvandenbossche commented Feb 16, 2021

jorisvandenbossche commented Feb 16, 2021

jbrockmendel commented Feb 17, 2021

jorisvandenbossche commented Feb 17, 2021

jbrockmendel commented Oct 7, 2021

jorisvandenbossche commented Oct 12, 2021

jbrockmendel commented Oct 12, 2021

bashtage commented Oct 12, 2021

jorisvandenbossche commented Oct 12, 2021

bashtage commented Oct 12, 2021 • edited Loading

jbrockmendel commented Dec 22, 2021

MarcoGorelli commented Dec 22, 2022

UlrichKreidenweis commented Oct 5, 2023

MarcoGorelli commented Oct 5, 2023 • edited Loading

UlrichKreidenweis commented Oct 5, 2023

MarcoGorelli commented Oct 6, 2023

MarcoGorelli commented Oct 6, 2023

MarcoGorelli commented Oct 16, 2023

rhshadrach commented Oct 26, 2023

MarcoGorelli commented Nov 10, 2023

jorisvandenbossche commented Feb 4, 2021 •

edited

Loading

jorisvandenbossche commented Feb 4, 2021 •

edited

Loading

toobaz commented Feb 4, 2021 •

edited

Loading

bashtage commented Oct 12, 2021 •

edited

Loading

MarcoGorelli commented Oct 5, 2023 •

edited

Loading