Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen4: Add hash join primitive and planning #9140

Merged
merged 40 commits into from
Nov 22, 2021
Merged

Conversation

systay
Copy link
Collaborator

@systay systay commented Nov 4, 2021

Description

This PR adds hash joins to the alternatives that the gen4 planner can use when planning queries.

The current join algorithm is a nested loop join, also known as an Apply Join. It will run the query on the RHS of the join as many times as there are rows on the LHS. The complexity is O(n*m). Hash Join will only run each query once, so the complexity is O(n+m)

Related Issue(s)

#7280

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

@systay systay changed the title Hash Join gen4: Add hash join primitive and planning Nov 9, 2021
@systay systay marked this pull request as ready for review November 9, 2021 09:56
@systay systay added Component: Query Serving release notes Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Nov 9, 2021
Signed-off-by: Andres Taylor <[email protected]>
go/vt/vtgate/planbuilder/jointree.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/hash_join.go Show resolved Hide resolved
go/vt/vtgate/planbuilder/jointree.go Outdated Show resolved Hide resolved
@systay systay marked this pull request as draft November 16, 2021 18:28
@systay
Copy link
Collaborator Author

systay commented Nov 16, 2021

We should hide this behind a planner hint until we are comfortable with it being used

@frouioui
Copy link
Member

We should hide this behind a planner hint until we are comfortable with it being used

we can now use the ALLOW_HASH_JOIN directive to allow hash join for a query

@systay systay marked this pull request as ready for review November 17, 2021 08:08
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/semantics/semantic_state.go Show resolved Hide resolved
go/vt/sqlparser/comments.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/testdata/filter_cases.txt Outdated Show resolved Hide resolved
go/vt/vtgate/engine/join.go Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to see 5-6 different kinds of queries in the end-to-end test.

go/vt/vtgate/engine/hash_join.go Outdated Show resolved Hide resolved
Comment on lines +224 to +228
typ, err := CoerceTo(v1.Type(), v2.Type()) // TODO systay we should add a method where this decision is done at plantime
if err != nil {
return 0, err
}
v1cast, err := castTo(v1, typ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the TODO,
what can at least be done is to pass the coerceTo value directly to the NullsafeCompare method and calculate that once in the engine.

@@ -28,37 +28,37 @@ var _ Primitive = (*Distinct)(nil)

// Distinct Primitive is used to uniqueify results
type Distinct struct {
Source Primitive
Source Primitive
ColCollations []collations.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a map over here which can simplify the access and also prevent any index out of range errors easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when executing things in the runtime, we want them as fast as possible. A map is actually slower than accessing a position by offset in a slice, so I'll keep the collations as a slice here

go/vt/vtgate/engine/distinct.go Show resolved Hide resolved
@systay systay merged commit ae58d49 into vitessio:main Nov 22, 2021
@systay systay deleted the hash-join branch November 22, 2021 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants