-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow arbitrary column expressions in JOIN ON clause #3935
Comments
closed via #4029 |
should this really be closed @universalmind303 ? The second case with "Allow arbitrary predicate functions for on" is a fairly standard use-case - imagine that we need to link a set of intervals and a set of timestamped events, then we need a double comparison like the one described. I think a cross-join is needed to accomplish the same in polars now? |
I agree, @Bonnevie . Not sure why this issue was closed, just enabling joins by |
There are also other issues for non-equi joins / intervals / overlaps: |
Hi, so can I do this? tbl1.join(tbl2, how="inner", left_on=[pl.col("art_no"), pl.col("var_tu_key") // 1000], right_on=[(pl.col("art_no"), pl.col("var_no")]) for SELECT *
FROM lf1 AS il
INNER JOIN lf2 AS pdim
ON il.art_no = pdim.art_no
AND DIV(il.var_tu_key, 1000) = pdim.var_no Is there any feature added now? |
Arbitrary column expressions
Typically in SQL one can express JOINs on arbitrary expressions. In polar's case, the JOIN must be an equality:
is implicitly equivalent to:
There are two things (with varying degrees of difficulty to implement) which polars cannot currently do:
Allow arbitrary expressions on the LHS/RHS of the equality
If we have lazy frames with the following content:
and:
and we try to join:
it raises a
PanicException: could not determine join column names
. If I were to alias the second one the result is even worse (this should be considered a bug):Here it seems like polars interpreted that expression as the field from the table! Uh oh. We expect
z == ["c", "d"]
not[null, "c"]
along the column.Allow arbitrary predicate functions for
on
JOINs can get complicated:
I don't think these non-equality predicate expressions can currently be expressed by
pl.LazyFrame.join
. AFAICT this can't be captured byjoin_asof
either. I would like the ability to say something like:to mean the same thing as the above SQL expression. Let me know what you think.
PS: Sorry for the barrage of tickets! I think polars is awesome and I'm currently scoping it out to see whether it fits my use case :)
The text was updated successfully, but these errors were encountered: