-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2786: applymap fails with dupe columns, ObjectBlock convert() method bombs #3102
Conversation
if you delete then insert a column in the same block at a different position I would only do this if its on a non unique index (eg leave the unique case as is) but I am not sure of the guarantees w.r.t. ordering of the items in block I think blocks need some method to deal with the location mapping problem push to 0.12? @wesm any thoughts? |
I can't believe these two are not equivalent. for i in range(len(self.items)):
values = self.values[i] and for i,c in enumerate(self.items)):
values = self.get('c') That would mean the order of items returned via enumerate( |
get uses get_loc on the items indexer |
I'm looking at get_loc right now. |
for monotonic it does a binary search Ok, will limit to dupes. |
in the dup case the indexer doesn't exist |
which I guess is the reason why This btw can made more efficient by hash chaining, couldn't it? that's linear in the worst case, and O(nindex) rather then O(ndupeX) |
fyi
|
got it. |
The test case in the issue puts you in ObjectBlock.convert() when self is an instance |
I looked at your fix, looks fine to me....i was just pointing out above that it possibily could be a problem when the block is mutated the above comment is testing something else, I was passing |
It's clear to me I don't have the full picture, holding off till I do. |
I just realized something...you cant delete a single item from a frame by position (because the block doesn't allow it), I think wes punted the whole issue and just made it so that you can't mutate when you have a non-unique index, so to do anything useful you have to reindex anyhow... your fix is correct (as it allows you to 'do' things with the frame) |
for both cases or just dupes? I'm not comfortable merging this until |
unique indicies act like my example (you have have to use get_loc becuase the order could be different than the data) for dups, the index throws an exception (see in _get_locs), so you can never use it, so order is positional by definition) I created an issue #3092 so can explore this in 0.12.... push for now I guess |
Where's this exception raised by |
ref_locs is the indexer into ref_items of the BlockManager This is only used in set_ref_items, and only when maybe_rename is True, which only occurs when there is a renaming operation
This is the root of all evil, this should raise the same as above (but doesn't even if
|
I think you can throw this under the bus (I mean in 0.11); can fix the dups stuff later |
I've lost sight of you 10 miles ago, so I won't merge this until I get If you're sure this is correct, go ahead and merge it for 0.11 yourself. |
ok...this was failing in 0.10.1 as well....pushing to 0.12...I think can fix this properly (e.g. by actually fixing the indexer), but too complicated for now |
agreed. it looked so innocent. |
if we're pushing it back to 0.12, at least we should put in a check+warning. |
it probably is......but just making sure |
I am going to reverse, I think this is ok, if you look at the interleave method in internals, this is the same way wes handled dups |
I'm way behind you on this. take it away as you see fit. |
moving back to 0.12, I see another issue that's also broken like this, #1943 |
disabled applymap for frames with dupe columns until this gets fixed proper. |
great...., though technically this only is a problem if they are not-unique in a single block IIRC |
I can live with that, probably not the worse thing in the world if it always |
hopefully can do it in 0.12 w/o being too radical I think....just a crown maybe |
@jreback, does this look ok? travis is still running.
#2786