Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: de-duplicate Block.__init__ #38134

Merged
merged 16 commits into from
Mar 9, 2021
Merged

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code labels Nov 28, 2020
pandas/core/internals/blocks.py Show resolved Hide resolved
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
@jreback jreback added the Deprecate Functionality to remove in pandas label Nov 29, 2020
@jreback jreback added this to the 1.2 milestone Nov 29, 2020
@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

ok for 1.2

@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

@jbrockmendel if you can address these comments can get this in

@jreback jreback mentioned this pull request Nov 29, 2020
@jbrockmendel
Copy link
Member Author

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for this change? Why is it needed to deprecate it? (pyarrow code currently works fine? Or is there a problem with it)

@jbrockmendel
Copy link
Member Author

What's the rationale for this change? Why is it needed to deprecate it? (pyarrow code currently works fine? Or is there a problem with it)

We recently made it so that all our internal calls explicitly pass ndim. Now we're telling other packages (that ideally wouldnt be accessing internals in the first place) they need to do the same.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 30, 2020

they need to do the same.

But so is there an actual reason to require this, apart from "we are not using the keyword anymore internally" ?
(I didn't closely follow your prior work related to ndim, so I am honestly asking. Does it give risks on bugs for the external users, or is it slower, or is it hard to maintain that functionality for us, or ... when they don't explicitly pass the ndim?)

If we do the change, we could also keep this for after the 1.2 RC, so the deprecation is only for 1.3, which gives pyarrow time to fix the code and have a release before this is released by pandas.

@simonjayhawkins simonjayhawkins added the Blocker Blocking issue or pull request for an upcoming release label Nov 30, 2020
@jbrockmendel
Copy link
Member Author

But so is there an actual reason to require this, apart from "we are not using the keyword anymore internally" ?
(I didn't closely follow your prior work related to ndim, so I am honestly asking. Does it give risks on bugs for the external users, or is it slower, or is it hard to maintain that functionality for us, or ... when they don't explicitly pass the ndim?)

The motivation that led me to update all of our internal usages to always pass ndim explicitly was that on other branches I was getting errors related to incorrect ndim inference. (Keep in mind I have active branches for both ArrayManager and EA2D among others and I don't recall off the top of my head which one(s) were producing these errors)

or is it slower

ndim inference isn't going to be particularly slow, but we do call the Block constructors a lot, so it could add up.

or is it hard to maintain that functionality for us

In general I think being more explicit and more strict in internals code makes maintenance easier, yes.

@simonjayhawkins
Copy link
Member

If we do the change, we could also keep this for after the 1.2 RC, so the deprecation is only for 1.3, which gives pyarrow time to fix the code and have a release before this is released by pandas.

@jbrockmendel can we remove the blocker tag? and do this for 1.3?

@jbrockmendel
Copy link
Member Author

and do this for 1.3?

sounds good

@jbrockmendel jbrockmendel removed the Blocker Blocking issue or pull request for an upcoming release label Dec 1, 2020
@simonjayhawkins simonjayhawkins removed this from the 1.2 milestone Dec 1, 2020
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Dec 11, 2020
@jbrockmendel
Copy link
Member Author

Mothballing pending apache/arrow#8957

@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Dec 20, 2020
@jbrockmendel
Copy link
Member Author

re-opening; now that make_block is exposed specifically for downstream, we can do the validation there and be stricter in private-internals

@jbrockmendel jbrockmendel reopened this Mar 7, 2021
@jreback
Copy link
Contributor

jreback commented Mar 8, 2021

lgtm @jorisvandenbossche if any issues.

@jorisvandenbossche
Copy link
Member

To make it explicit: the PR now no longer includes a deprecation warning. But you still changed the behaviour of Block.__init__, or not?

@jbrockmendel
Copy link
Member Author

To make it explicit: the PR now no longer includes a deprecation warning. But you still changed the behaviour of Block.init, or not?

Block.__init__ is changed, but the relevant validation is now done in api.make_block, so downstream libraries using that will be unaffected.

@jorisvandenbossche
Copy link
Member

Yep, but thus downstream libraries using Block can be affected. But since we are not aware of such libraries, and the known ones use make_block / make_block_same_class, that's probably OK

@jreback
Copy link
Contributor

jreback commented Mar 9, 2021

Yep, but thus downstream libraries using Block can be affected. But since we are not aware of such libraries, and the known ones use make_block / make_block_same_class, that's probably OK

great

@jreback jreback merged commit 2d0dbf3 into pandas-dev:master Mar 9, 2021
@jreback
Copy link
Contributor

jreback commented Mar 9, 2021

thanks @jbrockmendel

@jbrockmendel
Copy link
Member Author

alright! between this, #40262, and a couple follow-ups to this, im hopeful to take another big chunk out of the difference between libreduction-vs-purepython paths (xref #40263)

@jbrockmendel jbrockmendel deleted the ref-blk-init branch March 9, 2021 22:16
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Mar 11, 2021
@jbrockmendel jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants