Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner): Add EquiJoinPredicateRule #4665

Merged
merged 2 commits into from
Apr 19, 2022
Merged

feat(planner): Add EquiJoinPredicateRule #4665

merged 2 commits into from
Apr 19, 2022

Conversation

scbrickley
Copy link
Contributor

@scbrickley scbrickley commented Apr 12, 2022

This patch adds a few things:

  1. A new planner rule called EquiJoinPredicateRule that checks if a join predicate defines an equi-join (Closes Add a planner rule that can analyze a predicate function #4625)
  2. Stubs for a new join package, which will be fully implemented & documented later (See Add a join package to stdlib #4629 and Document join package #4631)
  3. A new check in runtime.Eval that panics if builtins are not finalized, with a clear error message describing how to fix the issue. This is the result of the difficulty I had trying to evaluate flux scripts in go unit tests within stdlib. The solution was to import fluxinit/static

I have one concern with the new planner rule that I could use some feedback on: Originally the intent was to have the join transformation check what kind of procedure spec it was passed, and to throw an error if it was not an EquiJoinProcedureSpec, but due to the scope of #4625, it was easier to just have the planner rule itself return an error. This may increase the workload for adding other implementations of join later on, but not by much. Still, I wonder if it's better to adjust the rule to not throw an error if it doesn't find an equi-join, and instead just return the original procedure spec.

@scbrickley scbrickley force-pushed the join-package branch 2 times, most recently from 829bfef to 7fc35db Compare April 13, 2022 18:47
@scbrickley scbrickley changed the title feat: Add EquiJoinPredicateRule to planner feat(planner): Add EquiJoinPredicateRule Apr 13, 2022
@scbrickley scbrickley force-pushed the join-package branch 6 times, most recently from 8afcc62 to ba59a45 Compare April 14, 2022 20:05
@scbrickley scbrickley marked this pull request as ready for review April 14, 2022 20:41
@scbrickley scbrickley requested review from a team as code owners April 14, 2022 20:41
@scbrickley scbrickley requested review from jsternberg and noramullen1 and removed request for a team April 14, 2022 20:41
@@ -268,6 +268,7 @@ var sourceHashes = map[string]string{
"stdlib/internal/testutil/testutil.flux": "f2969ae7c30d030a0e94b803c1eb2306de3d13ac60109bb3f9a3d756643fa2dd",
"stdlib/interpolate/interpolate.flux": "3d480c9058c584b65841db7889c459ad9bfed679f4ce24d17a84ec020cce768b",
"stdlib/interpolate/interpolate_test.flux": "077562d9840b7e4e5ead9565413fb2d88cf5dfac15b55175062afdff5da67305",
"stdlib/join/join.flux": "c431da575b754952dd4a4b2f71c0b6e56bd1b204520999b2192a83f5c6ad719f",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

"github.com/influxdata/flux/semantic"
)

const EquiJoinKind = "equijoin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be camelcase?

Suggested change
const EquiJoinKind = "equijoin"
const EquiJoinKind = "equiJoin"

stdlib/join/equijoin.go Show resolved Hide resolved
join.join(
left: left,
right: right,
on: (l, r) => r.a == l.b and r.d == l.b,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly test multiple comparison with an operator error for something like on: (l, r,) => r.a == l.b and !(r.d == l.b)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably a good idea. Adding a testcase for that now.

// - on:
// - as:
// - method:
builtin join : (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎸

@scbrickley scbrickley requested a review from wolffcm April 18, 2022 14:24
Copy link

@wolffcm wolffcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

My only issue is that the error messages aren't what I would want users seeing if they began using the new join package in anger. These can be hard to get right, especially when there are lots of cases that we can't yet support. We can work on that in this PR, or you could also file an issue to clean it up later.

scope.Set("r", values.Null)

intrp := interpreter.NewInterpreter(nil, nil)
sideEffects, _ := intrp.Eval(context.Background(), &node, scope, nil)
Copy link

@wolffcm wolffcm Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and it also doesn't quite capture all of the places where we handle exists.

We actually evaluate expressions in two places, in the interpreter as you're doing here, and also in the Compiler which is a really confusing name. The Compiler is the thing that evaluates, e.g., the lambda that's passed into map. You could use an acceptance test to do this, if it doesn't produce an error.

Moreover, there's really two kinds of things that should get tested here:

  • When we know the names and types of all the fields statically, e.g., what the input data to a function like map is produced with array.from
  • When we don't know all the fields statically, e.g., when the input data is generated via csv.from

If I understand right #4626 would need to test both of these cases for both the interpreter and the compiler.

// to evaluate flux scripts in go unit tests
if !r.finalized {
panic("runtime is not finalized - consider importing package fluxinit or fluxinit/static")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

fnBody := spec.On.Fn.Block.Body

if len(fnBody) != 1 {
return nil, false, errors.New(codes.Invalid, "predicate should only contain a single return statement")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users might be confused by this error, since a valid lambda to get passed in might look something like (l, r) => l.id == r.id and l.foo == r.bar and doesn't need an explicit return statement. (I believe that the semantic graph will have a Return statement for that case, but it can be elided in the syntax)

Also, the message should note that there are some forms of lambda that are just not supported yet, so the user understands that they didn't do something wrong, it's just that we have more work to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. I'm open to any suggestions you have for the wording. Maybe something like this?

some lambda expressions are not yet supported in join predicates - function body should only contain a single logical expression comparing equality of columns from the left and right tables

That seems kinda long winded, but I'm having trouble thinking of a more succinct way to communicate the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier if we could link them to some documentation explaining what is and isn't allowed in a predicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe once #4631 is finished, we can circle back and update these errors to just link to the generated fluxdoc page for the new join package.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I like your message, you could also explicitly refer to the name of the parameter:

some expressions are not yet supported in the "on" parameter - function body should only contain a single logical expression comparing equality of columns from the left and right tables

(I guess the error message will have prefixed with something like error in function join or some such)

@scbrickley scbrickley requested a review from wolffcm April 18, 2022 19:19
Copy link

@wolffcm wolffcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I don't think that this PR should close #4626, it should stay open until there are more tests.

@scbrickley
Copy link
Contributor Author

Right. Completely forgot about that. I'll move that test to a separate branch and link the issue in a new PR.

@scbrickley
Copy link
Contributor Author

@noramullen1 What are your thoughts on this PR from a docs perspective? There's a new package, but no real docs. Just the bare minimum to get the fluxdoc CI check to pass. There's an open issue to document this package that we intend to get to later: #4631

I am operating under the assumption that we don't want to document things that are not ready for users yet, but I'm not sure how you and your team feel about this. What should we do here?

@sanderson
Copy link
Contributor

@scbrickley I think it's fine to not include full docs at this point. Just make sure that the existing, bare bones docs include something like: "This package is in actively development and is not currently ready to be used". Same for the functions inside the package.

Just in case someone pulls the docs assets from a Flux version where the join package isn't ready yet.

@sanderson sanderson requested review from sanderson and removed request for noramullen1 April 18, 2022 20:24
@scbrickley
Copy link
Contributor Author

Thanks @sanderson! Making the changes now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a planner rule that can analyze a predicate function
4 participants