Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame construction with dictionary ArrowDtype columns #53654

Merged
merged 6 commits into from
Jun 21, 2023

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors Arrow pyarrow functionality labels Jun 13, 2023
import pyarrow as pa

if pa.types.is_dictionary(dtype.pyarrow_dtype):
other = other.astype(ArrowDtype(dtype.pyarrow_dtype.value_type))
Copy link
Member

@jbrockmendel jbrockmendel Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to have implications beyond the constructors

Can you add test(s) that directly test the affected indexing methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests for get_indexer, get_indexer_non_unique that seem related. LMK if there are other methods you also had in mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new tests look good, thanks.

I think we can make this a lot cheaper though since the places that all _unpack_nested_dtype only need the result's dtype. so with a little tinkering we can just return that and avoid doing a cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Might be better as a followup since this change is being backported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@mroeschke mroeschke added this to the 2.0.3 milestone Jun 14, 2023
@mroeschke
Copy link
Member Author

Looks to be greenish so merging, but can follow up if needed

@mroeschke mroeschke merged commit d36da2b into pandas-dev:main Jun 21, 2023
@mroeschke mroeschke deleted the bug/arrow/categorical branch June 21, 2023 01:41
@lumberbot-app
Copy link

lumberbot-app bot commented Jun 21, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 d36da2b77c4e9b6a7e5064bde0f2775bcf989c69
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #53654: BUG: DataFrame construction with dictionary ArrowDtype columns'
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-53654-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #53654 on branch 2.0.x (BUG: DataFrame construction with dictionary ArrowDtype columns)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

mroeschke added a commit to mroeschke/pandas that referenced this pull request Jun 21, 2023
mroeschke added a commit that referenced this pull request Jun 21, 2023
…Dtype columns (#53758)

* Backport PR #53654: BUG: DataFrame construction with dictionary ArrowDtype columns

* chage import
canthonyscott pushed a commit to canthonyscott/pandas-anthony that referenced this pull request Jun 23, 2023
…s-dev#53654)

* BUG: DataFrame construction with dictionary ArrowDtype columns

* Add tests for get_indexer

* Windows
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…s-dev#53654)

* BUG: DataFrame construction with dictionary ArrowDtype columns

* Add tests for get_indexer

* Windows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Constructors Series/DataFrame/Index/pd.array Constructors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Cannot construct DataFrame with dict as data and numeric dictionary ArrowDtype series as column
2 participants