You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there are duplicate columns after a join SQL expressions using table aliases fail and the SQL returns zero rows.
To Reproduce
# test for left join on multiple tables with rowsimportdaftdf1=daft.from_pydict({"idx":[1,2],"val":[10,20]})
df2=daft.from_pydict({"idx":[3],"score":[0.1]})
df3=daft.from_pydict({"idx":[1],"score":[0.1]})
df_sql=daft.sql("select * from df1 left join df2 on (df1.idx=df2.idx) left join df3 on (df1.idx=df3.idx) where df3.score > 0").show()
This produces:
If one renames one table column so there are no duplicates this works.
# test for left join on multiple tables with rowsimportdaftdf1=daft.from_pydict({"idx":[1,2],"val":[10,20]})
df2=daft.from_pydict({"idx":[3],"score1":[0.1]})
df3=daft.from_pydict({"idx":[1],"score":[0.1]})
df_sql=daft.sql("select * from df1 left join df2 on (df1.idx=df2.idx) left join df3 on (df1.idx=df3.idx) where df3.score > 0").show()
I assume this has to do with renaming of dataframe columns to prefix with right.?
Expected behavior
Table aliases work as expected.
Component(s)
SQL
Additional context
Testing with nightly build 0.3.8+dev0019.e4c6f3fa that has additions for joins from #3066
The text was updated successfully, but these errors were encountered:
universalmind303
changed the title
SQL: duplicate columns cause join SQL WHERE expressions to fail
duplicate columns cause join WHERE expressions to fail
Oct 18, 2024
I would add that one could make the case that the Dataframe API has different semantics and you need to use the right. format for where clauses for duplicate columns. However, for SQL this doesn't make sense so needs to be fixed even if Dataframe API stays the same. But I assume the two ideally would act the same - though with SQL there can be various ways to identify a column in the SQL expression.
Describe the bug
When there are duplicate columns after a join SQL expressions using table aliases fail and the SQL returns zero rows.
To Reproduce
This produces:
If one renames one table column so there are no duplicates this works.
I assume this has to do with renaming of dataframe columns to prefix with
right.
?Expected behavior
Table aliases work as expected.
Component(s)
SQL
Additional context
Testing with nightly build
0.3.8+dev0019.e4c6f3fa
that has additions for joins from #3066The text was updated successfully, but these errors were encountered: