concat with same column names #30776

krishnachouhan · 2020-01-07T11:49:26Z

Code Sample, a copy-pastable example if possible

>>> df1 = pd.DataFrame()
>>> df2 = pd.DataFrame()
>>> df3 = pd.DataFrame()
>>> df1['1'] = range(0,10)
>>> df2['2'] = range(0,20,2)
>>> df3['2'] = range(0,30,3)
>>> df = pd.concat([df1, df2, df3], axis=1)
>>> df
   1   2   2
0  0   0   0
1  1   2   3
2  2   4   6
3  3   6   9
4  4   8  12
5  5  10  15
6  6  12  18
7  7  14  21
8  8  16  24
9  9  18  27
>>> df['2']
    2   2
0   0   0
1   2   3
2   4   6
3   6   9
4   8  12
5  10  15
6  12  18
7  14  21
8  16  24
9  18  27
>>> df['2'] = range(0,50,5)
>>> df
   1   2   2
0  0   0   0
1  1   5   5
2  2  10  10
3  3  15  15
4  4  20  20
5  5  25  25
6  6  30  30
7  7  35  35
8  8  40  40
9  9  45  45
>>>

Problem description

why
Concat on dataframes containing same column name leads to multiple entries with same column name.(it should append the columns with column_name_1 and column_name_2, similar to merge). On performing actions on the column(as shown in above example) it leads to action replicated to both the columns.

Version
3.6.8 (default, Apr 25 2019, 21:02:35) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.12.2.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 18.1
setuptools : 40.6.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-01-07T11:59:13Z

You can use ignore_index=True to discard the entries. Otherwise, you can perform the renaming ahead of time. I don't think we want to make concat more complex than it already is by performing this automatic renaming.

You might be interested in following #28394, which would address this in a different way.

jreback · 2020-01-07T12:07:01Z

you are describing what merge does; this was also the purpose of the now removed join_axes.

i suppose we could raise/warn on the non concat axis with an errors keyword.

charlesdong1991 · 2020-01-07T14:31:59Z

@krishnachouhan you might want to refer to this issue: #21791

but as discussed before, this won't be added for concat

mroeschke · 2024-08-24T20:08:06Z

Thanks for the issue, but it appears this hasn't gotten traction in a while so closing

gfyoung added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 7, 2020

mroeschke added the Enhancement label Jul 25, 2021

mroeschke closed this as completed Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concat with same column names #30776

concat with same column names #30776

krishnachouhan commented Jan 7, 2020

TomAugspurger commented Jan 7, 2020

jreback commented Jan 7, 2020 •

edited

Loading

charlesdong1991 commented Jan 7, 2020

mroeschke commented Aug 24, 2024

concat with same column names #30776

concat with same column names #30776

Comments

krishnachouhan commented Jan 7, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented Jan 7, 2020

jreback commented Jan 7, 2020 • edited Loading

charlesdong1991 commented Jan 7, 2020

mroeschke commented Aug 24, 2024

Output of `pd.show_versions()`

jreback commented Jan 7, 2020 •

edited

Loading