Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

Closed
jreback opened this issue Mar 19, 2013 · 4 comments
Closed
Labels
Ideas Long-Term Enhancement Discussions Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

see discussion in #3059, #3095, also see #1943, #3102

This only applies with a non-unique column index

Currently if duplicate columns across dtypes there are issues in getting the correct block given a column name.

I think it is possible, though non-trivial, to instead have a positional map from the frame columns to the BlockManager blocks, will simplify BlockManager.iget.

Primary motivation is to_csv currently cannot handle these types of lookups.

Also should eliminate need for _find_block

In [6]: df = pd.DataFrame(np.random.randn(8,4))

In [12]: df = pd.DataFrame(np.random.randn(8,4))

In [13]: df._data.blocks[0].ref_locs
Out[13]: array([0, 1, 2, 3])

In [14]: df = pd.DataFrame(np.random.randn(8,4),columns=['a']*4)

In [15]: df._data.blocks[0].ref_locs
---------------------------------------------------------------------------

/mnt/home/jreback/pandas/pandas/core/internals.py in ref_locs(self)
     52     def ref_locs(self):
     53         if self._ref_locs is None:
---> 54             indexer = self.ref_items.get_indexer(self.items)
     55             indexer = com._ensure_platform_int(indexer)
     56             if (indexer == -1).any():

/mnt/home/jreback/pandas/pandas/core/index.pyc in get_indexer(self, target, method, limit)
    835 
    836         if not self.is_unique:
--> 837             raise Exception('Reindexing only valid with uniquely valued Index '
    838                             'objects')
    839 

Exception: Reindexing only valid with uniquely valued Index objects

This is the root of all evil, this should raise the same as above (but doesn't even if
I consolidate)......

In [16]: df = pd.DataFrame(np.random.randn(8,4))

In [17]: df.columns = ['a']*4

In [18]: df._data.blocks[0].ref_locs
Out[18]: array([0, 1, 2, 3])
@ghost
Copy link

ghost commented Mar 20, 2013

Is there actually a reason to support duplicate labels at the block layer, rather
then implementing it as a thin mapping layer on top of unique labels?

Other then rewriting half the lib that is.

@wesm
Copy link
Member

wesm commented Mar 20, 2013

The internals do need to all get retooled at some point =/

@jreback
Copy link
Contributor Author

jreback commented Apr 30, 2013

this is covered in #3468

@jreback
Copy link
Contributor Author

jreback commented May 2, 2013

closed by #3509

@jreback jreback closed this as completed May 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ideas Long-Term Enhancement Discussions Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

2 participants