Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make some tweaks to using the vectorized engine #46080

Merged
merged 2 commits into from
Mar 17, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 13, 2020

colexec: enable wrapping of unordered distinct in vectorized flow

Release justification: bug fixes and low-risk updates to new
functionality.

This commit enhances isSupported check during the vectorized execution
planning so that operators that cannot run in auto mode (unordered
distinct, percent_rank, and cume_dist) would be reported as
"unsupported" which will enable wrapping the processor cores into the
vectorized flow. Previously, we would refuse to vectorize the whole flow
which is a mistake.

Release note: None

sql: rename vectorize experimental_on to on

Release justification: bug fixes and low-risk updates to new
functionality.

This commit renames experimental_on option of vectorize variable to
on since we're now confident in the correctness. This commit also
changes the behavior of EXPLAIN (VEC) slightly - previously, we were
setting vectorize to (what was) experimental_on and then running
SupportsVectorized check. Now we will return an error if vectorize
is set to off and in other cases we will run the check with the
current vectorize mode. This is done so that EXPLAIN (VEC) better
reflects reality.

Release note (sql change): experimental_on option for vectorize
session variable has been renamed to on. The only things that will
not run with auto but will run with on are unordered distinct and
two window functions (percent_rank and cume_dist), otherwise, the
two options are identical.

@yuzefovich yuzefovich requested review from asubiotto and a team March 13, 2020 15:46
@yuzefovich yuzefovich requested a review from a team as a code owner March 13, 2020 15:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the execplan-auto-cleanup branch from a566b45 to 90e1811 Compare March 14, 2020 01:32
@yuzefovich yuzefovich changed the title colexec: enable wrapping of unordered distinct in vectorized flow sql: make some tweaks to using the vectorized engine Mar 14, 2020
@yuzefovich yuzefovich force-pushed the execplan-auto-cleanup branch from 90e1811 to 4f6028e Compare March 14, 2020 01:37
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ericharmeling on the rename

Reviewed 12 of 12 files at r1, 12 of 12 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/cmd/roachtest/tpchvec.go, line 515 at r2 (raw file):

				vectorizeSetting := "off"
				if vectorize {
					for versionPrefix, vectorizeOption := range vectorizeOptionByVersionPrefix {

nit: might be cleaner to get the prefix from the version and access the map using that.


pkg/sql/explain_vec.go, line 79 at r2 (raw file):

	if flowCtx.EvalCtx.SessionData.VectorizeMode == sessiondata.VectorizeOff {
		return errors.New("vectorize set to 'off'")

Why is this changing? Please also add a comment detailing the reasoning for this behavior.


pkg/sql/colexec/execplan.go, line 229 at r1 (raw file):

			return false, errors.Newf("distinct with error on duplicates not supported")
		}
		if mode != sessiondata.VectorizeExperimentalOn && mode != sessiondata.VectorizeExperimentalAlways {

Extract this check into a helper function and reuse for window fns.

@yuzefovich yuzefovich force-pushed the execplan-auto-cleanup branch from 4f6028e to 09011ff Compare March 16, 2020 15:36
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/cmd/roachtest/tpchvec.go, line 515 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: might be cleaner to get the prefix from the version and access the map using that.

There is already a precedent in this file of doing this kind of map iteration and prefix matching. I'd keep it this way because it's hard to mess this up whereas if we switch to extracting a prefix from the version first, I think that would be more error-prone.


pkg/sql/explain_vec.go, line 79 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why is this changing? Please also add a comment detailing the reasoning for this behavior.

Added a comment.


pkg/sql/colexec/execplan.go, line 229 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Extract this check into a helper function and reuse for window fns.

Good point, done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: minus EXPLAIN comment.

Reviewed 13 of 13 files at r3, 13 of 13 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/explain_vec.go, line 79 at r2 (raw file):

Previously, yuzefovich wrote…

Added a comment.

So currently when running EXPLAIN vectorize will show false if vectorize is off. But now running an EXPLAIN will return an error? Or is this only for EXPLAIN (VEC)

Release justification: bug fixes and low-risk updates to new
functionality.

This commit enhances `isSupported` check during the vectorized execution
planning so that operators that cannot run in `auto` mode (unordered
distinct, percent_rank, and cume_dist) would be reported as
"unsupported" which will enable wrapping the processor cores into the
vectorized flow. Previously, we would refuse to vectorize the whole flow
which is a mistake.

Release note: None
Release justification: bug fixes and low-risk updates to new
functionality.

This commit renames `experimental_on` option of `vectorize` variable to
`on` since we're now confident in the correctness. This commit also
changes the behavior of `EXPLAIN (VEC)` slightly - previously, we were
setting `vectorize` to (what was) `experimental_on` and then running
`SupportsVectorized` check. Now we will return an error if `vectorize`
is set to `off` and in other cases we will run the check with the
current `vectorize` mode. This is done so that `EXPLAIN (VEC)` better
reflects reality.

Release note (sql change): `experimental_on` option for `vectorize`
session variable has been renamed to `on`. The only things that will
not run with `auto` but will run with `on` are unordered distinct and
two window functions (`percent_rank` and `cume_dist`), otherwise, the
two options are identical.
@yuzefovich yuzefovich force-pushed the execplan-auto-cleanup branch from 09011ff to c8cc60b Compare March 17, 2020 14:17
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/explain_vec.go, line 79 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

So currently when running EXPLAIN vectorize will show false if vectorize is off. But now running an EXPLAIN will return an error? Or is this only for EXPLAIN (VEC)

This change affects only EXPLAIN (VEC) variant:

root@:26257/tpch> set vectorize=off;
SET

Time: 292µs

root@:26257/tpch> explain (vec) select count(*) from lineitem;
ERROR: vectorize is set to 'off'
root@:26257/tpch> explain select count(*) from lineitem;
    tree    |    field    |  description
------------+-------------+----------------
            | distributed | true
            | vectorized  | false
  group     |             |
   │        | aggregate 0 | count_rows()
   │        | scalar      |
   └── scan |             |
            | table       | lineitem@l_ok
            | spans       | ALL
(8 rows)

Time: 669µs

root@:26257/tpch> set vectorize=auto;
SET

Time: 296µs

root@:26257/tpch> explain select count(*) from lineitem;
    tree    |    field    |  description
------------+-------------+----------------
            | distributed | true
            | vectorized  | true
  group     |             |
   │        | aggregate 0 | count_rows()
   │        | scalar      |
   └── scan |             |
            | table       | lineitem@l_ok
            | spans       | ALL
(8 rows)

Time: 879µs

root@:26257/tpch> explain (vec) select count(*) from lineitem;
             text
-------------------------------
  │
  └ Node 1
    └ *colexec.countOp
      └ *colexec.colBatchScan
(4 rows)

Regular EXPLAIN didn't override the vectorize setting, so no changes are needed in there.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)


pkg/sql/logictest/logic.go, line 564 at r6 (raw file):

		"local",
		"local-vec-off",
		"local-vec",

@asubiotto what do you think about removing local-vec, fakedist-vec, fakedist-vec-disk configs from the ist of the default ones because they are basically the same as without "-vec" suffix? We would explicitly put this configs into the "distinct" and "window" logic tests and probably would get almost the same coverage.

@asubiotto
Copy link
Contributor

asubiotto commented Mar 17, 2020

That's a good idea but I would probably wait until the 20.1 branch gets cut. My reasoning is that we might still want to switch off some new functionality but keep the testing coverage without adding explicit configs to the test files.

Thanks for sharing the EXPLAIN output! Looks good.

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@yuzefovich
Copy link
Member Author

Noticed a typo in the comment.

bors r-

@craig
Copy link
Contributor

craig bot commented Mar 17, 2020

Canceled

@yuzefovich
Copy link
Member Author

Never mind, I think the comment is good.

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 17, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Mar 17, 2020

Build succeeded

@craig craig bot merged commit 733563a into cockroachdb:master Mar 17, 2020
@yuzefovich yuzefovich deleted the execplan-auto-cleanup branch March 17, 2020 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants