Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP-6: Ban upcasting in setitem-like operations #50424

Merged
merged 33 commits into from
Apr 21, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Dec 23, 2022

Original discussion: #39584

Turning into a PDEP as I think the int -> float warrants some care

@asishm
Copy link
Contributor

asishm commented Dec 23, 2022

I like it!

2 questions

  1. You have mentioned int vs float upcast/downcast as being an open item. What about upcasts within ints - for example if an out of bounds integer was set in an int8 series. (Currrently this upcasts to int16 / int32 etc. as appropriate)

  2. Similarly - What about setting pd.NA or np.nan to an int series (non-nullable). Currently this upcasts to float

@MarcoGorelli MarcoGorelli changed the title WIP PDEP6: ban upcasting in setitem-like operations PDEP6: ban upcasting in setitem-like operations Dec 24, 2022
@MarcoGorelli
Copy link
Member Author

Thanks @asishm for taking a look! Setting an out-of-bounds value in an int8 Series, or np.nan in any 'int' (non-nullable) Series, would also raise

@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 24, 2022 07:21
@MarcoGorelli MarcoGorelli added the PDEP pandas enhancement proposal label Dec 24, 2022
@MarcoGorelli
Copy link
Member Author

Thanks for your reviews!

Right, I think it's time for yet another @pandas-dev/pandas-core @pandas-dev/pandas-triage ping...would love to hear people's thoughts on this

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 28, 2022

I think you should be more complete on all the operations that would be affected. s.loc[1] = 1.5 would also raise under this proposal. In addition, setting values of a column in a DataFrame would also raise.

Also, we have discussed the possibility of eliminating inplace = True. If we did that first, wouldn't then the only places where this PDEP would apply be in Series.__setitem__() and Series.loc.__setitem__() (and the corresponding DataFrame places as well)?

@WillAyd
Copy link
Member

WillAyd commented Dec 28, 2022

I like the ideas here and would be in favor of the strict option for the float / int case. I think its easy enough to work around if that's really what you want and would enforce more consistent behavior within our own library

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

web/pandas/pdeps/0006-ban-upcasting.md Show resolved Hide resolved
@MarcoGorelli MarcoGorelli changed the title PDEP6: ban upcasting in setitem-like operations PDEP6: ban upcasting in Series setitem-like operations Dec 30, 2022
@MarcoGorelli MarcoGorelli marked this pull request as draft December 30, 2022 17:06
@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 30, 2022 17:16
@jreback
Copy link
Contributor

jreback commented Dec 30, 2022

is the intention to follow for other routines which can change dtype (either other pdep or as a follow up)

eg fillna, where?

Copy link
Member

@twoertwein twoertwein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to contribute to pandas (and pandas-stubs) and will voice my opinion when I have a strong one, but I wouldn't mind forfeiting my voting privileges.

Restricting the voting pool to more engaged people might make voting easier. If there are/will be PDEPs where you are short on votes, please ping me (again) if I haven't voted yet.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lukemanley
Copy link
Member

+1

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1Thanks for working on this @MarcoGorelli, very nice improvement.

Just two minor comments. When this is implemented, I'd provide in the error message how to change the type of the column (it's not present in the sample error message in the abstract).

About setting float in an int column, of the 3 listed options I'd go with 1 too, but I think another option to always raise should be available, and that would be my preference. Technically floats are approximations, and the concept of the decimal part being 0 is not accurate, which brings questions about which precision to use. No big deal, happy to move forward with option 1 too, and see this PDEP move forward which will be a great improvement. But something to consider during implementation and for the future.

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one comment. There are some other comments I've made that are still unresolved.

web/pandas/pdeps/0006-ban-upcasting.md Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member

+1

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 11, 2023

+1

@MarcoGorelli
Copy link
Member Author

Right, that's 12 upvotes, and no downvotes. Lot's of great discussion here, thanks all for the attention to detail, much appreciated!

Merging this today then

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Apr 21, 2023
@MarcoGorelli MarcoGorelli merged commit 910f159 into pandas-dev:main Apr 21, 2023
Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023
PDEP-6: Ban upcasting in setitem-like operations (pandas-dev#50424)

---------

Co-authored-by: MarcoGorelli <>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Irv Lustig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PDEP pandas enhancement proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.