-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: make CategoricalIndex._concat consistent with pd.concat #41626
Comments
I vaguely recall some discussions around changing the default behavior of |
that looks similar but i think may be orthogonal. in all of the affected tests cases i think we're dealing with one Categorical and one non-Categorical |
IIUC we should strive to improve
is what would do. I think is a strict improvement. |
@jorisvandenbossche want to weigh in here (before i get started on a PR)? one of the options here is value-dependent behavior |
I think I would opt for preserving the strict behaviour of Series. Although it is certainly tempting to make an exception. But having the behavior depend on which numbers are present (eg in the last test example) really doesn't sound ideal. The user can always cast to the dtype of the first object for doing the concat. (the case of concatting with an empty other Series is something that could be addressed separately, IMO, eg by having a "null" dtype for empty Series) Other idea: if we find it onerous for the user to cast all arguments passed to |
Index._concat
(used byIndex.append
) is thin wrapper aroundconcat_compat
. It is overriden by CategoricalIndex so that CategoricalDtype is retained more often than it is inconcat_compat
. We should make these match.If we just rip
CategoricalIndex._concat
, we break 6 tests, all of which boil down to:If we go the other way and change
concat_compat
, we break 6 different tests, all of which involve all-empty arrays or arrays that can be losslessly cast to the Categorical's dtype, e.g (edited for legibility)Changing concat_compat results in much more convenient behavior, but it is textbook "values-dependent behavior" that in general we want to avoid (cc @jorisvandenbossche)
The text was updated successfully, but these errors were encountered: