-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: Add support for vectorized engine to use builtin functions. #38826
Conversation
f5edb3f
to
dc4aa8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @solongordon)
pkg/sql/exec/builtin_funcs.go, line 88 at r3 (raw file):
switch outputPhysType { case types.Bool: converted, _ := converter(datum)
can't you pull this out above the type switch?
pkg/sql/exec/types/conv/conv.go, line 181 at r1 (raw file):
// PhysicalTypeColElemToDatum converts an element in a colvec to a datum of semtype ct. func PhysicalTypeColElemToDatum(
Could you actually move this to a different package? This is just a minor complaint but sqlbase
is a massive dependency that we hadn't needed in the exec.types
or conv
packages before... and execgen
depends on those which will make the compilation process slower I think. Anyway this is not a huge deal but I'd prefer it if you wouldn't mind.
pkg/sql/exec/types/conv/conv.go, line 17 at r2 (raw file):
"reflect" "unsafe" <<<<<<< HEAD
Something has gone wrong here :D
pkg/sql/logictest/testdata/logic_test/vectorize, line 406 at r3 (raw file):
query I SELECT abs(y) FROM builtin_test
This is awesome! Can you add a check too for the extract
builtins? Search in the tpch queries.go file to see examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @solongordon)
pkg/sql/exec/types/conv/conv.go, line 181 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Could you actually move this to a different package? This is just a minor complaint but
sqlbase
is a massive dependency that we hadn't needed in theexec.types
orconv
packages before... andexecgen
depends on those which will make the compilation process slower I think. Anyway this is not a huge deal but I'd prefer it if you wouldn't mind.
yeah, i'm moving it from the PR under this out into exec (it caused a circular dependency build failure, so not really a minor complaint!)
pkg/sql/exec/types/conv/conv.go, line 17 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Something has gone wrong here :D
yeah, just look at the latest commit -- some rebases along the way got messed up...
dc4aa8f
to
422e8b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
pkg/sql/exec/builtin_funcs.go, line 88 at r3 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
can't you pull this out above the type switch?
I could, but wouldn't it be more code to case on both DTrue and DFalse?
pkg/sql/logictest/testdata/logic_test/vectorize, line 406 at r3 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This is awesome! Can you add a check too for the
extract
builtins? Search in the tpch queries.go file to see examples.
Done.
422e8b7
to
2e2c924
Compare
RFAL |
Closed other PR that this depends, I'm just going to include its commit in here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @solongordon)
pkg/sql/exec/builtin_funcs.go, line 88 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I could, but wouldn't it be more code to case on both DTrue and DFalse?
Wait, what? I just mean you have converted, _ := converter(datum)
in every case - am I missing something that would prevent you from writing that line just once above the switch statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
pkg/sql/exec/builtin_funcs.go, line 88 at r3 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Wait, what? I just mean you have
converted, _ := converter(datum)
in every case - am I missing something that would prevent you from writing that line just once above the switch statement?
lol woops -- i thought u meant specifically the boolean case. I'll do that now.
2e2c924
to
6872ccd
Compare
Fixed those bugs + a merge conflict. RFAL |
6872ccd
to
75a4fae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left some comments.
The only thing is that it seems to me that it is slightly undertested. One way to increase the test coverage is to add a test against a processor (like the ones in columnar_operators_test
). Or maybe once we flip the setting to "streaming" by default, the logic tests will run against these builtins. Anyway, I wouldn't say it is required to be a part of this PR but as a part of follow-up work.
Reviewed 1 of 5 files at r3, 6 of 6 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @solongordon)
pkg/sql/exec/builtin_funcs.go, line 46 at r4 (raw file):
} type defaultBuiltinOperator struct {
[nit]: why is it called default
? This implies to me that there should be others non-default ones.
Also, a comment would be helpful (something along the lines "builtinOperator wraps around a builtin function which computes it one row at a time which allows for executing such builtins via the vectorized engine").
pkg/sql/exec/builtin_funcs.go, line 86 at r4 (raw file):
return } converted, _ := converter(datum)
I don't think we should be ignoring the error - I'd panic with it.
pkg/sql/exec/builtin_funcs.go, line 158 at r4 (raw file):
funcExpr *tree.FuncExpr, outputIdx int, ) (Operator, error) {
[nit]: if it's not possible to return an error here, I'd return only Operator
.
pkg/sql/exec/builtin_funcs.go, line 160 at r4 (raw file):
) (Operator, error) { // For now, return the default builtin operator. Future work can specialize
Oh, I see the point of "default". You should mention this in the comment above the operator itself.
pkg/sql/exec/builtin_funcs_test.go, line 24 at r4 (raw file):
) // Typing context to for the typechecker.
[nit]: "to for" is probably not what you meant.
pkg/sql/exec/builtin_funcs_test.go, line 26 at r4 (raw file):
// Typing context to for the typechecker. type mockTypeContext struct { ty []types.T
[super nit]: I don't think I've ever seen "ty" for types - feels weird.
pkg/sql/exec/builtin_funcs_test.go, line 97 at r4 (raw file):
} return NewBuiltinFunctionOperator(tctx, tc.outputTypes, input[0], typedExpr.(*tree.FuncExpr), 1)
If this was the reason for returning an error when creating a builtin operator, you can still return only an operator and wrap it like as follows:
for _, tc := range testCases {
t.Run(tc.desc, func(t *testing.T) {
wrapped := func(input []Operator) (Operator, error) {
expr, err := parser.ParseExpr(tc.expr)
if err != nil {
t.Fatal(err)
}
p := &mockTypeContext{ty: tc.inputTypes}
typedExpr, err := tree.TypeCheck(expr, &tree.SemaContext{IVarContainer: p}, types.Any)
if err != nil {
t.Fatal(err)
}
return NewBuiltinFunctionOperator(tctx, tc.outputTypes, input[0], typedExpr.(*tree.FuncExpr), 1), nil
}
runTests(t, []tuples{tc.inputTuples}, tc.outputTuples, orderedVerifier, []int{1}, wrapped)
}
pkg/sql/exec/vec_elem_to_datum.go, line 54 at r4 (raw file):
case semtypes.StringFamily: b := col.Bytes()[rowIdx] if ct.Oid() == oid.T_name {
This is the code that used to be in materializer
:
if ct.Oid() == oid.T_name {
m.row[outIdx].Datum = m.da.NewDString(tree.DString(*(*string)(unsafe.Pointer(&b))))
} else {
m.row[outIdx].Datum = m.da.NewDName(tree.DString(*(*string)(unsafe.Pointer(&b))))
}
Now, DString
and DName
are switched. I'm not sure what the correct way is, but I don't think the change was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree the testing is not ideal. I think it would be easier to just let the logic tests run against the builtins. I'm not sure if there is an easy way to randomly generate test cases against the processors (I think it would involve having to generate type aware expression strings to be used as renders).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/builtin_funcs.go, line 46 at r4 (raw file):
Previously, yuzefovich wrote…
[nit]: why is it called
default
? This implies to me that there should be others non-default ones.Also, a comment would be helpful (something along the lines "builtinOperator wraps around a builtin function which computes it one row at a time which allows for executing such builtins via the vectorized engine").
Done.
pkg/sql/exec/builtin_funcs.go, line 86 at r4 (raw file):
Previously, yuzefovich wrote…
I don't think we should be ignoring the error - I'd panic with it.
Done.
pkg/sql/exec/builtin_funcs.go, line 158 at r4 (raw file):
Previously, yuzefovich wrote…
[nit]: if it's not possible to return an error here, I'd return only
Operator
.
Done.
pkg/sql/exec/builtin_funcs.go, line 160 at r4 (raw file):
Previously, yuzefovich wrote…
Oh, I see the point of "default". You should mention this in the comment above the operator itself.
Done.
pkg/sql/exec/builtin_funcs_test.go, line 24 at r4 (raw file):
Previously, yuzefovich wrote…
[nit]: "to for" is probably not what you meant.
Done.
pkg/sql/exec/builtin_funcs_test.go, line 26 at r4 (raw file):
Previously, yuzefovich wrote…
[super nit]: I don't think I've ever seen "ty" for types - feels weird.
Done.
pkg/sql/exec/builtin_funcs_test.go, line 97 at r4 (raw file):
Previously, yuzefovich wrote…
If this was the reason for returning an error when creating a builtin operator, you can still return only an operator and wrap it like as follows:
for _, tc := range testCases { t.Run(tc.desc, func(t *testing.T) { wrapped := func(input []Operator) (Operator, error) { expr, err := parser.ParseExpr(tc.expr) if err != nil { t.Fatal(err) } p := &mockTypeContext{ty: tc.inputTypes} typedExpr, err := tree.TypeCheck(expr, &tree.SemaContext{IVarContainer: p}, types.Any) if err != nil { t.Fatal(err) } return NewBuiltinFunctionOperator(tctx, tc.outputTypes, input[0], typedExpr.(*tree.FuncExpr), 1), nil } runTests(t, []tuples{tc.inputTuples}, tc.outputTuples, orderedVerifier, []int{1}, wrapped) }
Done.
pkg/sql/exec/vec_elem_to_datum.go, line 54 at r4 (raw file):
Previously, yuzefovich wrote…
This is the code that used to be in
materializer
:if ct.Oid() == oid.T_name { m.row[outIdx].Datum = m.da.NewDString(tree.DString(*(*string)(unsafe.Pointer(&b)))) } else { m.row[outIdx].Datum = m.da.NewDName(tree.DString(*(*string)(unsafe.Pointer(&b)))) }
Now,
DString
andDName
are switched. I'm not sure what the correct way is, but I don't think the change was intentional.
This change was intentional -- I'm fairly sure that this was a bug in materializer. If the oid of the type was equal to T_name, its returning a DString rather than a DName. I fixed that here when moving over from materializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good to me.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/exec/builtin_funcs.go, line 46 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Done.
Usually, my workflow is such that I make requested changes locally, amend the commit, force-push it to the remote branch, and then reply to the comments so that when reviewers check back, they can see the changes right away (and continue on complaining if necessary :D ).
pkg/sql/exec/vec_elem_to_datum.go, line 54 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This change was intentional -- I'm fairly sure that this was a bug in materializer. If the oid of the type was equal to T_name, its returning a DString rather than a DName. I fixed that here when moving over from materializer.
I see. Can we test somehow that now the behavior is as desired?
75a4fae
to
ac9d836
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/builtin_funcs.go, line 46 at r4 (raw file):
Previously, yuzefovich wrote…
Usually, my workflow is such that I make requested changes locally, amend the commit, force-push it to the remote branch, and then reply to the comments so that when reviewers check back, they can see the changes right away (and continue on complaining if necessary :D ).
Sorry, I was running the linter again before I pushed :D
pkg/sql/exec/vec_elem_to_datum.go, line 54 at r4 (raw file):
Previously, yuzefovich wrote…
I see. Can we test somehow that now the behavior is as desired?
I'm not sure how to test this -- nothing complained when I changed it, so I don't think it was tested in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/exec/builtin_funcs.go, line 46 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Sorry, I was running the linter again before I pushed :D
:)
I usually don't run linter locally and just push, and the build will tell me whether something needs to be adjusted.
pkg/sql/exec/builtin_funcs_test.go, line 26 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Done.
Doesn't the linter complain about having a variable "types" override a package "types"? I thought that is why everyone uses "typs" or "t".
pkg/sql/exec/builtin_funcs_test.go, line 97 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Done.
Nice. I was actually overthinking things.
pkg/sql/exec/vec_elem_to_datum.go, line 54 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I'm not sure how to test this -- nothing complained when I changed it, so I don't think it was tested in the first place?
Yeah, I'm not sure either. Maybe other folks will have suggestions.
ac9d836
to
e3ab7ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/exec/builtin_funcs_test.go, line 26 at r4 (raw file):
Previously, yuzefovich wrote…
Doesn't the linter complain about having a variable "types" override a package "types"? I thought that is why everyone uses "typs" or "t".
I'm not sure it will, but I'll just change it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/coldata/vec.go, line 111 at r8 (raw file):
Previously, rohany (Rohan Yadav) wrote…
it feels like a good abstraction for setting a datum into a position into a colvec -- otherwise we would have to duplicate the casting logic everywhere it's needed
I don't know, I feel like we've been adding very specific methods to this interface too easily (e.g. all the Copy
/Append
variants) and this case seems similar. I would agree if we used something like this widely but it seems like it's only used in one place. It's also hard to see a general use of it, since hopefully we'll never have to cast values whenever we set an element in a col vec + do a couple of type switches. It might also be confusing for callers to have this abstraction at this level given the ambiguity of whether this element will end up as null or not and how to make the choice between directly doing _TemplateType()[i] = val vs. .SetValueAt. I would lean towards just having a type switch and cast in the callsite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/coldata/vec.go, line 111 at r8 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I don't know, I feel like we've been adding very specific methods to this interface too easily (e.g. all the
Copy
/Append
variants) and this case seems similar. I would agree if we used something like this widely but it seems like it's only used in one place. It's also hard to see a general use of it, since hopefully we'll never have to cast values whenever we set an element in a col vec + do a couple of type switches. It might also be confusing for callers to have this abstraction at this level given the ambiguity of whether this element will end up as null or not and how to make the choice between directly doing _TemplateType()[i] = val vs. .SetValueAt. I would lean towards just having a type switch and cast in the callsite.
It also used in materializer right now -- I think jordan also had a use in mind for the cfetcher
@yuzefovich how can I run that test locally? I'm having some trouble reproducing errors from the build too |
I think the quickest way for you to run the test locally is to modify Another way is to cherry-pick commits from my WIP branch (https://github.com/yuzefovich/cockroach/tree/vec-setting). |
Here is one interesting difference:
Maybe this is related to the change you made with |
Thats definitely due to my change -- I changed it back to what it was after jordan's comment |
With Jordan's help, I figured out which logic tests panic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/coldata/vec.go, line 111 at r8 (raw file):
Previously, rohany (Rohan Yadav) wrote…
It also used in materializer right now -- I think jordan also had a use in mind for the cfetcher
Where is it used in the materializer? I'm only seeing one usage in builtin_funcs
. I would also like to hear more about this cfetcher use case given my point above about general use. I just feel pretty strongly against adding more stuff that seems unnecessary to this interface given that I spent a while trying to reduce it in size. I think if you find an abstraction like this useful, please extract it into a helper function rather than making it an interface method and then we can make it so if we find it does have widespread usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/coldata/vec.go, line 111 at r8 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Where is it used in the materializer? I'm only seeing one usage in
builtin_funcs
. I would also like to hear more about this cfetcher use case given my point above about general use. I just feel pretty strongly against adding more stuff that seems unnecessary to this interface given that I spent a while trying to reduce it in size. I think if you find an abstraction like this useful, please extract it into a helper function rather than making it an interface method and then we can make it so if we find it does have widespread usage.
Sorry, I misread my code. You're right that this is only used here. For cleanliness I had moved it into vec.go to templatize it, but for sake of not extending the interface with more functions I can move it back into here, and just templatize this part of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/coldata/vec.go, line 111 at r8 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Sorry, I misread my code. You're right that this is only used here. For cleanliness I had moved it into vec.go to templatize it, but for sake of not extending the interface with more functions I can move it back into here, and just templatize this part of the code
Thanks!
8527fbc
to
e945801
Compare
I cherry-picked this change and didn't see any new failures on logic tests with "streaming" by default, so I think we can merge this. |
Alright, ready for (another / final) look before I merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing that I should have asked for earlier is a benchmark. Ideally, all operators should have benchmarks. Just something simple that runs a builtin on some integers will do, so we can have a gut feel for the throughput.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @solongordon and @yuzefovich)
e945801
to
ec98793
Compare
BenchmarkBuiltinFunctions/useSel=true,hasNulls=true-8 10000 196757 ns/op 41.63 MB/s 127809 B/op 2281 allocs/op |
bors r=jordanlewis |
ec98793
to
bd0d17d
Compare
Canceled |
bors r=jordanlewis |
Build failed |
Added support for the vectorized engine to use all builtin functions that are supported by distsql. The mechanism for doing this is relatively ineffecient however -- it involves converting the needed row elements into datums, and the result back from a datum once the function application is complete. Future work could specialize these operators to specific functions to avoid these type switching operations. Implementation of this feature includes refactoring the materializer's batch to row code into a separate, reusable function. Release note: None
bd0d17d
to
996cf44
Compare
bors r=jordanlewis |
38826: exec: Add support for vectorized engine to use builtin functions. r=jordanlewis a=rohany Add support for the vectorized engine to use all builtin functions that are provided by distsql, but in a relatively slow manner. Future work can specialize some functions for improved performance. Co-authored-by: Rohan Yadav <[email protected]>
Thanks! As expected, this is very slow - in the 10s of mb/s. I expect this to be a bottleneck for the TPCH queries that use it, and we will want to consider enhancing this over time with specializations for builtins we care about. |
I have been thinking about an easy way to plug in specialized implementations for these builtins, but there doesn't seem like an easy way to do it. Specializing for specific functions will probably just involve implementing a new operator for now. |
Build succeeded |
Add support for the vectorized engine to use all builtin functions that are provided by distsql, but in a relatively slow manner. Future work can specialize some functions for improved performance.