-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate na_sentinel, add use_na_sentinel #46910
Comments
+1 |
The other option could also be to allow We actually added |
Thanks @jorisvandenbossche! When I was reasoning about it in #46601 (comment), I was focused on the code: pandas/pandas/core/algorithms.py Lines 715 to 718 in 60ac973
thinking that |
Are there known use cases where users pass na_sentinel other than -1 or None? e.g. i dont think ive ever seen |
A positive sentinel is likely to break things
I don't know any use case for specifying na_sentinel in pd.factorize, and if there is one, a user can change the sentinel after with a single line of code. +1 on deprecating the specifying of the sentinel value. In that case, there are two options:
This is then very similar to pyarrow's null_encoding argument in dictionary_encode being "mask" or "encode"; perhaps those would be good names going forward? If that is a good direction, do we prefer "null_encoding" vs "na_encoding" (thinking particularly of pd.NA here). |
@jorisvandenbossche - any thoughts on the proposal to deprecate specifying the specific value of |
@jorisvandenbossche - friendly ping |
Assuming we go forward with my suggestion in #46910 (comment), with the EA interface currently marked as experimental, we have the option of removing the ability to set Proposal 1 - Deprecate na_sentinel in pd.factorize; change EA to future state immediately:
Proposal 2 - Deprecate na_sentinel in pd.factorize; have EA match (with na_sentinel also deprecated):
Since pd.factorize doesn't match ExtensionArray.factorize today, that makes me think Proposal 1 would be a better route. Any preference @jbrockmendel? |
I suppose there are theoretical use case of other
While we might still label it like that, there are lots of people using it, so I would prefer something with a smooth upgrade path for external EA implementors (so I prefer something more in the line of proposal 2). For naming, one problem with |
Thanks @jorisvandenbossche - I don't have any objection to calling it a "mask", but do see how this usage is a bit of a stretch from a typical use. I'm also happy with your suggestion of |
I agree with joris about a smooth deprecation path. No real preference otherwise. |
After trying a few ways, I find myself preferring |
Currently specifying
na_sentinel=None
in pd.factorize will use the sentinel -1 and set (internally) dropna=False. EA's factorize currently doesn't allowna_sentinel
being None, and #46601 added the dropna argument to EA's factorize.It seems best to me to change pd.factorize to match EA's factorize. Tagging as 1.5 since if we are to deprecate, I'd like to get it in.
cc @jbrockmendel @jorisvandenbossche @TomAugspurger
The text was updated successfully, but these errors were encountered: