ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

jreback · 2013-03-19T18:25:16Z

see discussion in #3059, #3095, also see #1943, #3102

This only applies with a non-unique column index

Currently if duplicate columns across dtypes there are issues in getting the correct block given a column name.

I think it is possible, though non-trivial, to instead have a positional map from the frame columns to the BlockManager blocks, will simplify BlockManager.iget.

Primary motivation is to_csv currently cannot handle these types of lookups.

Also should eliminate need for _find_block

In [6]: df = pd.DataFrame(np.random.randn(8,4))

In [12]: df = pd.DataFrame(np.random.randn(8,4))

In [13]: df._data.blocks[0].ref_locs
Out[13]: array([0, 1, 2, 3])

In [14]: df = pd.DataFrame(np.random.randn(8,4),columns=['a']*4)

In [15]: df._data.blocks[0].ref_locs
---------------------------------------------------------------------------

/mnt/home/jreback/pandas/pandas/core/internals.py in ref_locs(self)
     52     def ref_locs(self):
     53         if self._ref_locs is None:
---> 54             indexer = self.ref_items.get_indexer(self.items)
     55             indexer = com._ensure_platform_int(indexer)
     56             if (indexer == -1).any():

/mnt/home/jreback/pandas/pandas/core/index.pyc in get_indexer(self, target, method, limit)
    835 
    836         if not self.is_unique:
--> 837             raise Exception('Reindexing only valid with uniquely valued Index '
    838                             'objects')
    839 

Exception: Reindexing only valid with uniquely valued Index objects

This is the root of all evil, this should raise the same as above (but doesn't even if
I consolidate)......

In [16]: df = pd.DataFrame(np.random.randn(8,4))

In [17]: df.columns = ['a']*4

In [18]: df._data.blocks[0].ref_locs
Out[18]: array([0, 1, 2, 3])

The text was updated successfully, but these errors were encountered:

ghost · 2013-03-20T20:57:04Z

Is there actually a reason to support duplicate labels at the block layer, rather
then implementing it as a thin mapping layer on top of unique labels?

Other then rewriting half the lib that is.

wesm · 2013-03-20T21:04:46Z

The internals do need to all get retooled at some point =/

jreback · 2013-04-30T14:57:07Z

this is covered in #3468

jreback · 2013-05-02T14:52:43Z

closed by #3509

jreback mentioned this issue Mar 19, 2013

ENH: improve performance of df.to_csv GH3054 #3059

Merged

ghost mentioned this issue Mar 19, 2013

Allow duplicate columns in df.to_csv #3095

Closed

jreback mentioned this issue Mar 20, 2013

#2786: applymap fails with dupe columns, ObjectBlock convert() method bombs #3102

Closed

ghost mentioned this issue Apr 1, 2013

Enable applymap for dataframes with duplicate columns #3230

Closed

This was referenced Apr 30, 2013

BUG: GH3468 Fix assigning a new index to a duplicate index in a DataFrame would fail #3483

Closed

BUG/CLN: Allow the BlockManager to have a non-unique items (axis 0) #3509

Merged

jreback closed this as completed May 2, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

jreback commented Mar 19, 2013

ghost commented Mar 20, 2013

wesm commented Mar 20, 2013

jreback commented Apr 30, 2013

jreback commented May 2, 2013

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

Comments

jreback commented Mar 19, 2013

ghost commented Mar 20, 2013

wesm commented Mar 20, 2013

jreback commented Apr 30, 2013

jreback commented May 2, 2013