Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix categorical conversion from chunked arrow arrays #15886

Merged
merged 2 commits into from
May 30, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented May 30, 2024

Description

The current logic for converting arrow dictionary arrays to cudf doesn't properly uniquify categories across chunks of chunked arrays. This PR implements the simplest fix by having arrow combine chunks when this case is encountered.

Resolves #6828

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added bug Something isn't working non-breaking Non-breaking change labels May 30, 2024
@vyasr vyasr self-assigned this May 30, 2024
@vyasr vyasr requested a review from a team as a code owner May 30, 2024 00:12
@vyasr vyasr requested review from wence- and bdice May 30, 2024 00:12
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 30, 2024
@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit bab0d80 into rapidsai:branch-24.08 May 30, 2024
70 checks passed
@vyasr vyasr deleted the fix/from_arrow_chunked branch May 30, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] arrow data: Categorical categories must be unique
2 participants