Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW: Ensure that iterrows does not allow mutating parent #51271

Merged
merged 4 commits into from
Feb 16, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 9, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jorisvandenbossche

Any idea on how to do this nicer?

Do you think we need a way to hand over a ref object when constructing a Series from an array at some point?

@jorisvandenbossche
Copy link
Member

Yeah, if this is a pattern that will be used more, then we should add a helper for it. In general I think we should still try to avoid this kind of pandas object -> values -> pandas object with views outside of the internals to avoid having to think about refs there.

Possible ideas to do this cleaner:

  • Rely on indexing (self.iloc[i] to iterate over rows), and then refs will already be taken care of. But I suppose that will give a big overhead (because you repeat the dtype calculation for each row). And that's probably the same for self._mgr.fast_xs(i) which is special-case method in internals to get a row.
  • Add a method to the BlockManager to have the core logic there (like a version of fast_xs but then not for a single row but all rows)

@phofl
Copy link
Member Author

phofl commented Feb 10, 2023

Yep I agree with all your points and would also
Like to avoid it as much as possible, but it’s tricky here without hurting Performance I think.

I‘ll explore adding something on the manager level, that sounds like a good compromise

@jorisvandenbossche
Copy link
Member

Shall we short-term go for this "ugly" solution? (so at least to have correct behaviour)

@phofl
Copy link
Member Author

phofl commented Feb 15, 2023

Probably a good idea, I think I can get something done before the actual 2.0 is out

@jorisvandenbossche jorisvandenbossche merged commit 81d4f0c into pandas-dev:main Feb 16, 2023
@phofl phofl deleted the cow_iterrows branch February 16, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants