Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Groupby median on timedelta column with NaT returns odd value (#… #57957

Merged
merged 6 commits into from
Mar 26, 2024

Conversation

JJLLWW
Copy link
Contributor

@JJLLWW JJLLWW commented Mar 21, 2024

…57926)

Handle NaT correctly in group_median_float64

@JJLLWW JJLLWW force-pushed the groupby-median-nat branch from a132083 to f9a6d3d Compare March 22, 2024 00:42
@JJLLWW JJLLWW force-pushed the groupby-median-nat branch from f9a6d3d to 44d56aa Compare March 22, 2024 00:52
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looking good.

pandas/_libs/groupby.pyx Outdated Show resolved Hide resolved
pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved
pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved
doc/source/whatsnew/v3.0.0.rst Outdated Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

cdef float64_t median_linear(float64_t* a, int n) noexcept nogil:
cdef float64_t median_linear(
float64_t* a,
int n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI these should be Py_ssize_t not int. int is limited to 2 ** 31 - 1 on most platforms, so this would cause undefined behavior if ever called with an n that exceeded that (probably unlikely within groupby...but still possible)

Happy to have that in a follow up PR

@WillAyd WillAyd added this to the 2.2.2 milestone Mar 26, 2024
@WillAyd WillAyd merged commit 5b3617d into pandas-dev:main Mar 26, 2024
50 checks passed
@WillAyd
Copy link
Member

WillAyd commented Mar 26, 2024

Thanks @JJLLWW

@rhshadrach
Copy link
Member

@WillAyd - why the backport? I didn't think this was a regression.

@WillAyd
Copy link
Member

WillAyd commented Mar 26, 2024

Ah OK - I thought we were doing bugfixes on that branch now. Didn't realize it was limited to regressions. Can just keep this on 3.x then

@WillAyd WillAyd modified the milestones: 2.2.2, 3.0 Mar 26, 2024
@pandas-dev pandas-dev deleted a comment from lumberbot-app bot Mar 26, 2024
@rhshadrach
Copy link
Member

This might be just me, but I like to keep patch versions as small as possible since bugfixes can induce new bugs and would like patch upgrade to be as safe as possible for users. But in the past, I know others have been more inclined to backport more than I prefer - cc @phofl for any thoughts.

@phofl
Copy link
Member

phofl commented Mar 26, 2024

Agree with @rhshadrach

we backported fixes for new features that were introduced in the version before but generally not much else

@rhshadrach
Copy link
Member

we backported fixes for new features that were introduced in the version before but generally not much else

Ah, I think that's where I remember a disagreement - thanks. And yea, I've come around to doing bug fixes for new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby median on timedelta column with NaT returns odd value
5 participants