-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(planner): Add EquiJoinPredicateRule #4665
Conversation
829bfef
to
7fc35db
Compare
8afcc62
to
ba59a45
Compare
libflux/go/libflux/buildinfo.gen.go
Outdated
@@ -268,6 +268,7 @@ var sourceHashes = map[string]string{ | |||
"stdlib/internal/testutil/testutil.flux": "f2969ae7c30d030a0e94b803c1eb2306de3d13ac60109bb3f9a3d756643fa2dd", | |||
"stdlib/interpolate/interpolate.flux": "3d480c9058c584b65841db7889c459ad9bfed679f4ce24d17a84ec020cce768b", | |||
"stdlib/interpolate/interpolate_test.flux": "077562d9840b7e4e5ead9565413fb2d88cf5dfac15b55175062afdff5da67305", | |||
"stdlib/join/join.flux": "c431da575b754952dd4a4b2f71c0b6e56bd1b204520999b2192a83f5c6ad719f", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
"github.com/influxdata/flux/semantic" | ||
) | ||
|
||
const EquiJoinKind = "equijoin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be camelcase?
const EquiJoinKind = "equijoin" | |
const EquiJoinKind = "equiJoin" |
join.join( | ||
left: left, | ||
right: right, | ||
on: (l, r) => r.a == l.b and r.d == l.b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly test multiple comparison with an operator error for something like on: (l, r,) => r.a == l.b and !(r.d == l.b)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's probably a good idea. Adding a testcase for that now.
// - on: | ||
// - as: | ||
// - method: | ||
builtin join : ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great.
My only issue is that the error messages aren't what I would want users seeing if they began using the new join
package in anger. These can be hard to get right, especially when there are lots of cases that we can't yet support. We can work on that in this PR, or you could also file an issue to clean it up later.
interpreter/interpreter_test.go
Outdated
scope.Set("r", values.Null) | ||
|
||
intrp := interpreter.NewInterpreter(nil, nil) | ||
sideEffects, _ := intrp.Eval(context.Background(), &node, scope, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, and it also doesn't quite capture all of the places where we handle exists
.
We actually evaluate expressions in two places, in the interpreter as you're doing here, and also in the Compiler
which is a really confusing name. The Compiler
is the thing that evaluates, e.g., the lambda that's passed into map
. You could use an acceptance test to do this, if it doesn't produce an error.
Moreover, there's really two kinds of things that should get tested here:
- When we know the names and types of all the fields statically, e.g., what the input data to a function like
map
is produced witharray.from
- When we don't know all the fields statically, e.g., when the input data is generated via
csv.from
If I understand right #4626 would need to test both of these cases for both the interpreter and the compiler.
// to evaluate flux scripts in go unit tests | ||
if !r.finalized { | ||
panic("runtime is not finalized - consider importing package fluxinit or fluxinit/static") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
stdlib/join/equijoin.go
Outdated
fnBody := spec.On.Fn.Block.Body | ||
|
||
if len(fnBody) != 1 { | ||
return nil, false, errors.New(codes.Invalid, "predicate should only contain a single return statement") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users might be confused by this error, since a valid lambda to get passed in might look something like (l, r) => l.id == r.id and l.foo == r.bar
and doesn't need an explicit return
statement. (I believe that the semantic graph will have a Return
statement for that case, but it can be elided in the syntax)
Also, the message should note that there are some forms of lambda that are just not supported yet, so the user understands that they didn't do something wrong, it's just that we have more work to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point. I'm open to any suggestions you have for the wording. Maybe something like this?
some lambda expressions are not yet supported in join predicates - function body should only contain a single logical expression comparing equality of columns from the left and right tables
That seems kinda long winded, but I'm having trouble thinking of a more succinct way to communicate the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier if we could link them to some documentation explaining what is and isn't allowed in a predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe once #4631 is finished, we can circle back and update these errors to just link to the generated fluxdoc page for the new join package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I like your message, you could also explicitly refer to the name of the parameter:
some expressions are not yet supported in the "on" parameter - function body should only contain a single logical expression comparing equality of columns from the left and right tables
(I guess the error message will have prefixed with something like error in function join
or some such)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I don't think that this PR should close #4626, it should stay open until there are more tests.
Right. Completely forgot about that. I'll move that test to a separate branch and link the issue in a new PR. |
f575c49
to
f05d619
Compare
@noramullen1 What are your thoughts on this PR from a docs perspective? There's a new package, but no real docs. Just the bare minimum to get the fluxdoc CI check to pass. There's an open issue to document this package that we intend to get to later: #4631 I am operating under the assumption that we don't want to document things that are not ready for users yet, but I'm not sure how you and your team feel about this. What should we do here? |
@scbrickley I think it's fine to not include full docs at this point. Just make sure that the existing, bare bones docs include something like: "This package is in actively development and is not currently ready to be used". Same for the functions inside the package. Just in case someone pulls the docs assets from a Flux version where the |
Thanks @sanderson! Making the changes now. |
a23ab6b
to
cbddae6
Compare
cbddae6
to
db8a38f
Compare
This patch adds a few things:
EquiJoinPredicateRule
that checks if ajoin
predicate defines an equi-join (Closes Add a planner rule that can analyze a predicate function #4625)join
package, which will be fully implemented & documented later (See Add ajoin
package to stdlib #4629 and Documentjoin
package #4631)runtime.Eval
that panics if builtins are not finalized, with a clear error message describing how to fix the issue. This is the result of the difficulty I had trying to evaluate flux scripts in go unit tests withinstdlib
. The solution was to importfluxinit/static
I have one concern with the new planner rule that I could use some feedback on: Originally the intent was to have the
join
transformation check what kind of procedure spec it was passed, and to throw an error if it was not anEquiJoinProcedureSpec
, but due to the scope of #4625, it was easier to just have the planner rule itself return an error. This may increase the workload for adding other implementations of join later on, but not by much. Still, I wonder if it's better to adjust the rule to not throw an error if it doesn't find an equi-join, and instead just return the original procedure spec.