Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: algorithms.factorize moves null values when sort=False #46601
BUG: algorithms.factorize moves null values when sort=False #46601
Changes from 32 commits
670c2e8
98c6c18
007329b
ffaf20c
58e5556
c600e9a
f7326bd
b0ec48a
395c9cf
d0796ed
351eb0d
cadab86
f44b7f3
f93f968
ef49c74
2a439eb
8378ba0
f2e24df
dc20283
b51e88f
4a36bf0
6db0685
ca53df0
372efe7
cf56135
0b85a3d
bc3f426
57a05a7
9c35dd0
a7c3538
b27bda0
b45ace7
c4cfbc6
ecb182c
7143a52
82b61b6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im trying to follow this and keep getting lost here. is any of this going to get simpler once deprecations are enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - but how much depends on what you mean by "this". There is a lot of playing around with arguments (what I think you're highlighting here) that will all go away. But what I regard as the main complication this PR introduces, noted in the TODO immediately below, will not go away just with the deprecation.
For that TODO, some changes to
safe_sort
are necessary. For the bulk of cases the changes are straightforward and only require very small changes. However if you have multiple Python types in an object array (e.g. float and string) with np.nan and None, I haven't yet to find a good way to sort. I plan to revisit this in the next few days.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at safe_sort again, and @phofl already solved much of the issue in #47331. I opened a PR into this branch here to see what the execution of the TODO mentioned above would look like: rhshadrach#2
However, changing safe_sort in this way will induce another behavior change:
No other tests besides the one changed failed locally. While I do think this is a bugfix, I'd like to study more what impact it has on concat/reshaping/indexing and I think it may need some discussion. In particular, I don't think it should be done in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel - friendly ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed both that the "feature" behavior looks more correct and that it should be done separate from this PR.
Taking another look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Coming in fairly cold), I think I'm getting lost too here. I am not fully comprehending why we check
na_sentinel == -1 or na_sentinel is None
and then useuse_na_sentinel=na_sentinel is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - there are some gymnastics here no doubt. The idea behind this block is to avoid using catch_warnings since it's possible.
Old API: na_sentinel is either an integer or None
New API: use_na_sentinel is False or True
The correspondence is:
use_na_sentinel False is equivalent to na_sentinel being None
use_na_sentinel True is equivalent to na_sentinel is -1
Note there is no option in the new API for na_sentinel being anything other than -1 or None in the old API. So we can use the new argument precisely when (a) the function has said argument and (b) na_sentinel is either -1 or None. In such a case, the correspondence from na_sentinel to use_na_sentinel is given by
use_na_sentinel = na_sentinel is not None
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, just the explanation I needed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this correspondence as a comment to the top of this function; we can remove it when the deprecation is enforced.