Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34449 [Python] array(pd.Categorical) raising for arrow-backed cat #34456

Closed
wants to merge 1 commit into from

Conversation

phofl
Copy link
Contributor

@phofl phofl commented Mar 4, 2023

Rationale for this change

This occurs when saving pandas arrow-backed categoricals to parquet. There might be a better way to do this than calling combine_chunks, but couldn't find a way to convert a ChunkedArray to a DictionaryArray

What changes are included in this PR?

Bugfix and tests only

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, bug fix

@phofl phofl requested a review from AlenkaF as a code owner March 4, 2023 20:55
@github-actions github-actions bot added the awaiting review Awaiting review label Mar 4, 2023
@github-actions
Copy link

github-actions bot commented Mar 4, 2023

@github-actions
Copy link

github-actions bot commented Mar 4, 2023

⚠️ GitHub issue #34449 has been automatically assigned in GitHub to PR creator.

@AlenkaF
Copy link
Member

AlenkaF commented Mar 6, 2023

Thank you @phofl for contributing!

As I mentioned in the issue comments, there is already a PR for a duplicated GitHub issue: #34289

As you can see in the comments of the duplicated PR issue #33727 (comment), the failure happens because pa.DictionaryArray.from_array doesn't accept chunked arrays. In this case, the ChunkedArray will always have only one chunk and so a better way of solving the issue is to return an array in case of single chunked data when calling _handle_arrow_array_protocol (__arrow_array__).

@phofl
Copy link
Contributor Author

phofl commented Mar 6, 2023

Sorry I should have checked the arrow issue. I came here through debugging the pandas equivalent. Closing then, sorry for the overhead

@phofl phofl closed this Mar 6, 2023
@AlenkaF
Copy link
Member

AlenkaF commented Mar 6, 2023

No problem at all! Thank you for making an effort 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review Awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] to_parquet fails with a category field backed by pyarrow string
2 participants