ENH: Disallow duplicate column names everywhere by default #53217

joelostblom · 2023-05-13T20:12:23Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Having duplicated columns can lead to confusing downstream behavior that might be difficult to detect, e.g. we recently had this occur in Altair for a couple of users vega/altair#2718.

Feature Description

It was suggested in the PR that introduced the flag to disallow duplicates that this might be suitable as a default option in the future #28394 (comment), but I couldn't find a follow up discussion so I 'm opening this issue to suggest that this becomes the default behavior to protect users from doing things they might not intend to, like selecting the same column twice.

Alternative Solutions

Keep the current default

Additional Context

No response

topper-123 · 2023-06-04T18:33:17Z

I'm not sure what my opinion is on this, but open to discussions.

Currently, we disallow by setting an attribute in flags (see here), which IMO is the wrong API and we should rather have a parameter in the index constructor, like Index(..., allow_duplicates=False) instead. Then it would be easier to discuss if the parameter flag should be False or True.

topper-123 · 2023-06-04T18:41:23Z

To add, the flag-based approach doesn't allow us to decide if we want label duplicates in the DataFrame constructor, which doesn't seem right. E.g. we'd want

>>> df = pd.DataFrame(data,
...     index=Index(..., allow_duplicates=True|False),
...     columns=Index(..., allow_duplicates=True|False),
... )

for precise control in the constructor. Also, a decision has to be if non-duplicate labels also means non-duplicate label indexing, e.g. should we disallow df.loc[["a", "a"]] when we disallow duplicate labels.

tomhoq · 2024-04-18T11:15:19Z

Is this still to be implemented?

joelostblom added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 13, 2023

joelostblom mentioned this issue May 13, 2023

TypeError if DataFrame contains duplicated column name (in some cases) vega/altair#2718

Open

j-bennet mentioned this issue May 17, 2023

Support duplicated columns in a bunch of DataFrame methods dask/dask#10261

Merged

3 tasks

topper-123 added Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2023

mroeschke mentioned this issue May 21, 2024

Remove partial support for duplicate MultIindex names unless they are all None rapidsai/cudf#10500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Disallow duplicate column names everywhere by default #53217

ENH: Disallow duplicate column names everywhere by default #53217

joelostblom commented May 13, 2023

topper-123 commented Jun 4, 2023

topper-123 commented Jun 4, 2023

tomhoq commented Apr 18, 2024

ENH: Disallow duplicate column names everywhere by default #53217

ENH: Disallow duplicate column names everywhere by default #53217

Comments

joelostblom commented May 13, 2023

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

topper-123 commented Jun 4, 2023

topper-123 commented Jun 4, 2023

tomhoq commented Apr 18, 2024