pd.concat() crashes if dataframe contains duplicate indices but not df.join() #36263
Labels
good first issue
Needs Tests
Unit test(s) needed to prevent regressions
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Milestone
I just found out that when we concatenate two dataframes horizontally, if one dataframe has duplicate indices, pd.concat() will crash, but df.join() will not crash. Instead, df.join() will spread the values into all rows with the same index value. Is this behavior by design? Thanks!
By right, if the dataframes have duplicate indices, it can behave like df.join() and at least it should NOT crash.
I suggest we introduce additional arguments to handle duplicate indices, e.g., if the same index has X(>0) rows in df1, Y(>0) rows in df2, then if
dup_index=
:combinatorial
: after merging, it will haveX*Y
rows for every combination possibility.outer-top-align
: after merging, it will havemax(X, Y)
rows, in which the rows align from topouter-bottom-align
: after merging, it will havemax(X, Y)
rows, in which the rows align from bottominner-top-align
: after merging, it will havemin(X, Y)
rows, in which the rows align from topinner-bottom-align
: after merging, it will havemin(X, Y)
rows, in which the rows align from bottomraise
: raise an exception with the warning messageThe text was updated successfully, but these errors were encountered: