-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: decrease vectorize_row_count_threshold to 0 #55713
Conversation
b253b6f
to
0671b98
Compare
0671b98
to
d8d6c5c
Compare
d8d6c5c
to
8591ce0
Compare
8591ce0
to
11ceb86
Compare
The benchmarks. 3 node roachprod cluster
I trust these numbers a lot more than the ones I got here because benchmarking on the laptop can have significant variance in performance. My current guess is that this difference is mostly caused by the fact that we're pooling |
Thanks for running these benchmarks. What setup did you use? I guess we're not quite ready to merge this yet. |
It was 3 node roachprod cluster with the default hardware options in GCE, the load was coming from node 1. Is this what you mean by "setup"? I took a look at profiles, and nothing really stood out, and the pooling behavior is the main difference I can see right now. I'll take a stab tomorrow at cleaning up the fetchers a bit. |
Yes, I meant hardware setup. Thanks! |
On tpcc we typically use a fourth node for the workload to simulate an
application being separate from the database resources. May be worth
adjusting the testing setup in the future.
On Wed, Oct 21, 2020 at 11:30 PM Jordan Lewis ***@***.***> wrote:
Yes, I meant hardware setup. Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#55713 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFJ7F75CGTQ32LCAJ6BVEMLSL6ROZANCNFSM4SWWJKFQ>
.
--
Andy Woods
Senior Product Manager, Cockroach Labs
[email protected]
|
11ceb86
to
d38f53e
Compare
One pretty obvious thing that I missed that is likely to explain the majority of the difference is that without So I think apart from the pooling of objects we'll need to do something about this check. I feel like the most viable option is to make it a lot more lightweight: instead of creating the whole fake flow with all of the components, we could probably just inspect the processor specs and see whether there are any that we refuse to wrap, and if we either support "natively" or can wrap all of the cores, then we'll assume that we can successfully setup the whole flow. I think when |
Hm, to my surprise a prototype with some pooling and a more lightweight support check (#55883) shows a similar big drop in performance between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/cli/cli_test.go, line 531 at r1 (raw file):
c.RunWithArgs([]string{`sql`, `--set`, `unknownoption`, `-e`, `select 123 as "123"`}) // Check that partial results + error get reported together. c.RunWithArgs([]string{`sql`, `-e`, `select 1/(@1-2) from generate_series(1,3)`})
Out of curiosity, why did you need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/cli/cli_test.go, line 531 at r1 (raw file):
Previously, knz (kena) wrote…
Out of curiosity, why did you need to change this?
Currently the test will fail if run via the vectorized engine because a division by zero error will occur sooner (it occurs on the third row, but we have batches of growing capacity starting at 1, so the batch sizes will be 1, 2, 4, ..., 1024, 1024, ..., and the error occurs when processing the second batch at which point only 1 row has been returned to the client whereas the test expects 2 rows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/cli/cli_test.go, line 531 at r1 (raw file):
Previously, yuzefovich wrote…
Currently the test will fail if run via the vectorized engine because a division by zero error will occur sooner (it occurs on the third row, but we have batches of growing capacity starting at 1, so the batch sizes will be 1, 2, 4, ..., 1024, 1024, ..., and the error occurs when processing the second batch at which point only 1 row has been returned to the client whereas the test expects 2 rows).
I think it's worth explaining this in a comment here.
Let's discuss this and how to move forward during the next meeting. |
A newer prototype (that is almost production ready) shows an improvement - the performance hit on KV95 if we remove |
d38f53e
to
132b429
Compare
More KV95 numbers:
|
LGTM! Ship it! |
Err, I guess we decided to also see TPCC numbers, right? I'm getting eager. |
Yes, I'll be rerunning TPCC today as well as MovR to see what's up, and if those look good, we'll merge tomorrow I think. |
TPCC100 numbers again show no significant difference:
The only interesting detail is that during 60 min run with old config on AWS apparently one node died and "new order" transactions were returning an error. I didn't notice it before wiping the cluster, so I'm not sure what happened. And here are the numbers of
|
43166a8
to
266a4dc
Compare
Updated all the benchmark numbers and the commit message. PTAL, @asubiotto @jordanlewis |
220b010
to
a138cea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
TFTR! bors r+ |
Build failed: |
|
a138cea
to
3003207
Compare
This commit wraps the errors that occur during builtin functions evaluations to provide more context. Release note: None
This commit decreases the default value for `vectorize_row_count_threshold` setting to 0 which means that we will be using the vectorized engine for all supported queries. We intend to remove that setting entirely in 21.1 release, but for now we choose the option of effectively disabling it, just in case. The benchmarks have shown the following: - -1.5% on KV95 - similar performance on TPCC - -3% on movr - -10% on miscellaneous operations (joins, aggregations) on small tables. We think that such gap is small enough to merge this change, and we intend to optimize the vectorized engine more before making the final call for the default value for the 21.1 release. Additionally, this commit collects the trace metadata on the outboxes. Release note (sql change): The default value for `vectorize_row_count_threshold` setting has been decreased from 1000 to 0 meaning that from now on we will always use the vectorized engine for all supported queries regardless of the row estimate (unless `vectorize=off` is set).
3003207
to
6203345
Compare
Ok, bors r+ |
Build succeeded: |
colexec: add context to errors from builtin functions
This commit wraps the errors that occur during builtin functions
evaluations to provide more context.
Release note: None
sql: decrease vectorize_row_count_threshold to 0
This commit decreases the default value for
vectorize_row_count_threshold
setting to 0 which means that we will beusing the vectorized engine for all supported queries. We intend to
remove that setting entirely in 21.1 release, but for now we choose the
option of effectively disabling it, just in case.
The benchmarks have shown the following:
We think that such gap is small enough to merge this change, and we
intend to optimize the vectorized engine more before making the final
call for the default value for the 21.1 release.
Additionally, this commit collects the trace metadata on the outboxes.
Informs: #53893.
Release note (sql change): The default value for
vectorize_row_count_threshold
setting has been decreased from 1000 to0 meaning that from now on we will always use the vectorized engine for
all supported queries regardless of the row estimate (unless
vectorize=off
is set).