Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: setitem with mixed-resolution dt64s #56419

Merged
merged 14 commits into from
Apr 23, 2024

Conversation

jbrockmendel
Copy link
Member

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Dec 9, 2023
@MarcoGorelli
Copy link
Member

what happens after this PR for the cases in #56410 ? would they emit a warning now, and raise in the future?

@jbrockmendel
Copy link
Member Author

what happens after this PR for the cases in #56410 ? would they emit a warning now, and raise in the future?

The __setitem__ case issues a PDEP6 warning and will raise when that is enforced. The .where and .fillna cases case to "M8[ms]".

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 17, 2024
@jbrockmendel
Copy link
Member Author

@MarcoGorelli gentle ping

@jbrockmendel
Copy link
Member Author

if there are no further comments, i plan to merge this next week

@MarcoGorelli
Copy link
Member

sorry for the delay, will try to take a look tomorrow

@MarcoGorelli
Copy link
Member

Sorry this has taken a while, just not sure about something

Isn't this going to risk causing overflows? E.g.

ser = pd.Series(np.array(['NaT', '3000-01-01', '3000-01-02'], dtype='datetime64[s]'))

item = pd.Timestamp('2020-01-01T00:00:00.123456789')
print(ser.fillna(item))

This raises

pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 3000-01-01 00:00:00

Checking the behaviour for nullable integers:

pd.Series([pd.NA, 2, 3], dtype='Int64').fillna(1.5)  # raises
pd.Series([pd.NA, 2, 3], dtype='Int64[pyarrow]').fillna(1.5)  # silently truncates 1.5 to 1

@jbrockmendel
Copy link
Member Author

Isn't this going to risk causing overflows? E.g.

Yes, and that is a good thing. In the status quo we silently round that non-nano fill value and give an incorrect result.

The pyarrow example looks pretty clearly buggy to me.

@MarcoGorelli
Copy link
Member

to be honest I'm not totally sure about this - I'm not going to block it, but don't really feel comfortable approving, could you ask someone else please?

@jbrockmendel
Copy link
Member Author

@mroeschke @phofl thoughts?

@mroeschke
Copy link
Member

I do find the overflow behavior a better behavior than silently truncating.

But I also find the exception message in @MarcoGorelli 's #56419 (comment) kinda opaque compared to setting a float into a nullable int (TypeError: Invalid value '1.5' for dtype Int64)

@jbrockmendel
Copy link
Member Author

It is a bit tough to tailor the exception message due to where the exception is raised. How about f"Incompatible (high-resolution) value for {self.dtype}. Explicitly cast before operating."

@mroeschke
Copy link
Member

How about f"Incompatible (high-resolution) value for {self.dtype}. Explicitly cast before operating."

Yeah I would be good with that

@jbrockmendel
Copy link
Member Author

npdev failure looks unrelated

@mroeschke mroeschke removed the Stale label Apr 23, 2024
@mroeschke mroeschke added this to the 3.0 milestone Apr 23, 2024
@mroeschke mroeschke merged commit ff27271 into pandas-dev:main Apr 23, 2024
45 of 46 checks passed
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the bug-setitem-mixed-56410 branch April 23, 2024 17:10
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* BUG: setitem with mixed-resolution dt64s

* Move whatsnew to 3.0

* de-xfail

* improve exception message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/API: setitem/where with mixed dt64 units
3 participants