-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: locality optimized scan for queries with a LIMIT clause #75431
Conversation
8851d37
to
89aadab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek)
pkg/sql/opt/table_meta.go, line 328 at r1 (raw file):
} // FirstColumnIDOfIndex returns the ColumnID of the index with ordinal number
nit: returns the ColumnID of the first column of the index with ordinal number...
Feels like this function could be more generally useful if you allow passing in the column number too. So then instead of FirstColumnIDOfIndex
you could call it IndexColumnID
or something like that and pass in a param i int
.
If you make that change, I think there are a lot of places in the code we could clean up by using this function.
pkg/sql/opt/constraint/constraint.go, line 475 at r1 (raw file):
// whether the end of the last span or the start of the current span are in // the local region, but not both. func onlyOneSpanIsLocal(
Instead of using this function which relies on crdb_internal_region
(which as I said below, we should avoid if possible), can you do something similar to getLocalSpans
?
cockroach/pkg/sql/opt/xform/scan_funcs.go
Line 220 in 3916aa6
func (c *CustomFuncs) getLocalSpans( |
Maybe you can extract a helper function to be used by both?
pkg/sql/opt/constraint/constraint.go, line 477 at r1 (raw file):
func onlyOneSpanIsLocal( sp *Span, last *Span, localRegion string, ) (onlyOneSpanIsLocal bool, firstColIsRegionType bool) {
It's more idiomatic to just call this second bool ok
, unless you have a good reason not to
pkg/sql/opt/constraint/constraint.go, line 484 at r1 (raw file):
var ok bool if spanStartRegion, ok = sp.start.GetInternalRegionNameFromPrefix(); !ok { firstColIsRegionType = false
nit: I'd just return false here directly (here and below)
pkg/sql/opt/constraint/constraint.go, line 495 at r1 (raw file):
} else { onlyOneSpanIsLocal = false }
this can be simplified to:
onlyOneSpanIsLocal = spanStartRegionIsLocal != lastSpanEndRegionIsLocal
pkg/sql/opt/constraint/constraint.go, line 514 at r1 (raw file):
// name, for whatever reason. This prevents the comparison of the span // prefix region names in the code below to an uninitialized region name. localRegion, firstColIsRegionType = evalCtx.Locality.Find("region")
I'd rather give this boolean firstColIsRegionType
a different name that corresponds to its actual meaning, which is something like shouldCheckSpanRegions
pkg/sql/opt/constraint/constraint.go, line 530 at r1 (raw file):
// An example query on a LOCALITY REGIONAL BY ROW table which this // benefits is: // SELECT * FROM regional_by_row_table WHERE pk <> 4 LIMIT 3;
I see you added a test for this in the logic tests, but if possible, it would be good to also include a test for this logic in the constraint-specific tests in opt/constraint
pkg/sql/opt/constraint/constraint.go, line 533 at r1 (raw file):
if last.endBoundary == IncludeBoundary && sp.startBoundary == IncludeBoundary && sp.start.IsNextKey(&keyCtx, last.end) && (!firstColIsRegionType || !localRemoteCrossover) {
Any way to simplify this?
pkg/sql/opt/constraint/key.go, line 280 at r1 (raw file):
} // GetInternalRegionNameFromPrefix returns the region name contained in the
I'm hoping we can remove both of these methods (see comments elsewhere). They also seem a bit too specific for the Key
type, which is otherwise quite general
pkg/sql/opt/xform/scan_funcs.go, line 115 at r1 (raw file):
if scanPrivate.Constraint == nil { // If not a limited scan, then we must have a constraint to analyze spans.
nit: This comment is a bit confusing the way it's written. Since we're already inside the block with scanPrivate.Constraint == nil
, I'd just say
// Since we have no constraint, we must have a limit to use this optimization.
// We also require the limit to be less than the kv batch size, since it's probably
// better to use DistSQL once we're scanning multiple batches.
// TODO(rytaft): Revisit this when we have a more accurate cost model for data
// distribution.
(copied the rest of the comment from below for consistency)
pkg/sql/opt/xform/scan_funcs.go, line 164 at r1 (raw file):
maxRows := rowinfra.KeyLimit(grp.Relational().Cardinality.Max) hardLimit := rowinfra.KeyLimit(scanPrivate.HardLimit) if hardLimit > 0 && hardLimit < maxRows {
I don't think this check is necessary -- the Cardinality should be derived from the limit if it exists.
pkg/sql/opt/xform/scan_funcs.go, line 203 at r1 (raw file):
} } else { constraint = scanPrivate.Constraint
nit: for simplicity, you could just do:
constraint := scanPrivate.Constraint
if constraint == nil {
var ok bool
if constraint, ok = c.getCrdbRegionCheckConstraint(scanPrivate); !ok {
return
}
}
pkg/sql/opt/xform/scan_funcs.go, line 247 at r1 (raw file):
// table specified by scanPrivate and returns the Constraint which references // crdb_internal_region, if such a Constraint exists. func (c *CustomFuncs) getCrdbRegionCheckConstraint(
I don't think this function should be necessary. We already have functions c.checkConstraintFilters
and c.computedColFilters
, and then you can use c.tryConstrainIndex
to get the constraint. You can see how they are currently used inside of GenerateConstrainedScans
in combination with other filters:
cockroach/pkg/sql/opt/xform/select_funcs.go
Line 220 in a0da5e2
computedColFilters := c.computedColFilters(scanPrivate, explicitFilters, optionalFilters) |
pkg/sql/types/types.go, line 2748 at r1 (raw file):
return false } return typ.UserDefined() && typ.TypeMeta.Name.Name == "crdb_internal_region"
I'd like to avoid using this type if possible inside the optimizer, since it means our optimizations won't work at all for customers using the old style of multi-region features (e.g. partitioning). It would be better to rely on the zone configs of the partitioning columns as I've done for the existing locality optimized search stuff.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2264 at r1 (raw file):
############################################## # Locality optimized scans with LIMIT clause #
There are a lot of really great tests here, but it would be good to think about:
- Can some of these tests be moved to
opt/xform
to explicitly test the rule (those tests are also more efficient to run since they don't require starting a whole cluster)? - Are all of these tests really necessary? Tests aren't free to add, since they increase cognitive load for reviewers and maintainers, increase the time to run the test files during development, CI, etc. Obviously we want good test coverage, but I think some of these tests seem unrelated to your new feature, e.g., the ones that add redundant check constraints in the schema. Is there a reason you think we need those here?
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2421 at r1 (raw file):
query T SELECT * FROM [EXPLAIN SELECT * FROM regional_by_row_table a WHERE pk IN (SELECT pk FROM regional_by_row_table b LIMIT 3)] OFFSET 2
for this type of formatting, check out the -rewrite-sql
option in the logic tests, which formats SQL statements according to this formatter: https://sqlfum.pt
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2453 at r1 (raw file):
6 6 5 -5 # Correlated semijoin with LIMIT in outer query block should not enable locality optimized scan.
nit: comments should be <= 80 columns wide
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2455 at r1 (raw file):
# Correlated semijoin with LIMIT in outer query block should not enable locality optimized scan. query T SELECT * FROM [EXPLAIN (DISTSQL) SELECT * FROM child WHERE EXISTS (SELECT * FROM parent WHERE p_id = c_p_id) LIMIT 3] OFFSET 2
why are you using explain (DISTSQL) here?
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2481 at r1 (raw file):
30 30 # Uncorrelated semijoin with LIMIT can't enable locality optimized scan in the outer query block yet.
nit: comments should be 80 columns wide or less
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2809 at r1 (raw file):
3 NULL NULL ca-central-1 us-east-1 # Adding a filter disallows locality optimized scan.
Maybe say why? Because the limit cannot be pushed into the scan, right?
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 3167 at r1 (raw file):
statement ok SET vectorize=on
why do you need to change the value of vectorize here? I think the test will automatically be run with both configurations
89aadab
to
7f5259d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/table_meta.go, line 328 at r1 (raw file):
FirstColumnIDOfIndex
In looking into this, I found there is already the following function which seems to have similar functionality:
// IndexColumnID returns the metadata id of the idxOrd-th index column in the
// given index.
func (t TableID) IndexColumnID(idx cat.Index, idxOrd int) ColumnID {
return t.ColumnID(idx.Column(idxOrd).Ordinal())
}
I've modified and renamed my function to IdxColumnID
and have it call IndexColumnID
. I also added a sanity check.
pkg/sql/opt/constraint/constraint.go, line 475 at r1 (raw file):
getLocalSpans
pkg/sql/opt/constraint/constraint.go, line 475 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Instead of using this function which relies on
crdb_internal_region
(which as I said below, we should avoid if possible), can you do something similar togetLocalSpans
?cockroach/pkg/sql/opt/xform/scan_funcs.go
Line 220 in 3916aa6
func (c *CustomFuncs) getLocalSpans( Maybe you can extract a helper function to be used by both?
Done. Refactored into new function HasMixOfLocalAndRemotePartitions.
pkg/sql/opt/constraint/constraint.go, line 477 at r1 (raw file):
HasMixOfLocalAndRemotePartitions
OK, this function is replaced with HasMixOfLocalAndRemotePartitions in which I'm now usingok
.
pkg/sql/opt/constraint/constraint.go, line 484 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I'd just return false here directly (here and below)
This function is removed.
pkg/sql/opt/constraint/constraint.go, line 495 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this can be simplified to:
onlyOneSpanIsLocal = spanStartRegionIsLocal != lastSpanEndRegionIsLocal
Thanks, updated it: localRemoteCrossover = spanIsLocal != lastSpanIsLocal
. I fixed up the local/remote checking to check the whole span instead of the span boundaries.
pkg/sql/opt/constraint/constraint.go, line 514 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'd rather give this boolean
firstColIsRegionType
a different name that corresponds to its actual meaning, which is something likeshouldCheckSpanRegions
Refactored out.
pkg/sql/opt/constraint/constraint.go, line 530 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I see you added a test for this in the logic tests, but if possible, it would be good to also include a test for this logic in the constraint-specific tests in
opt/constraint
This was a really good tip. After adding a test here I saw we could cover more test cases by building a Constraint which covered and consolidated all local partitions.
See makeLocalPartitionsConstraint
. For example, with 2 integer partition values (1,2) a span of [/1 - /2] cannot currently be identified as local unless you add a check constraint.
It should be efficient too because the call to Constraint.ContainsSpan
does binary search. I think this check could subsume new code I put in to make the PrefixSorter do binary search, however, I can't enable span-based checking because we'd need to subtract out remote partition spans (prefixes) of longer length. I put a note about it next to makeLocalPartitionsConstraint
.
pkg/sql/opt/constraint/constraint.go, line 533 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Any way to simplify this?
Simplified it to: && !localRemoteCrossover
pkg/sql/opt/xform/scan_funcs.go, line 115 at r1 (raw file):
// Since we have no constraint, we must have a limit to use this optimization.
// We also require the limit to be less than the kv batch size, since it's probably
// better to use DistSQL once we're scanning multiple batches.
// TODO(rytaft): Revisit this when we have a more accurate cost model for data
// distribution.
Replaced comment.
pkg/sql/opt/xform/scan_funcs.go, line 164 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think this check is necessary -- the Cardinality should be derived from the limit if it exists.
OK, removed the updating of maxRows.
Code quote:
if hardLimit > 0 && hardLimit < maxRows {
pkg/sql/opt/xform/scan_funcs.go, line 203 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: for simplicity, you could just do:
constraint := scanPrivate.Constraint if constraint == nil { var ok bool if constraint, ok = c.getCrdbRegionCheckConstraint(scanPrivate); !ok { return } }
Done.
pkg/sql/opt/xform/scan_funcs.go, line 247 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think this function should be necessary. We already have functions
c.checkConstraintFilters
andc.computedColFilters
, and then you can usec.tryConstrainIndex
to get the constraint. You can see how they are currently used inside ofGenerateConstrainedScans
in combination with other filters:cockroach/pkg/sql/opt/xform/select_funcs.go
Line 220 in a0da5e2
computedColFilters := c.computedColFilters(scanPrivate, explicitFilters, optionalFilters)
Thanks, removed this function.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2264 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
There are a lot of really great tests here, but it would be good to think about:
- Can some of these tests be moved to
opt/xform
to explicitly test the rule (those tests are also more efficient to run since they don't require starting a whole cluster)?- Are all of these tests really necessary? Tests aren't free to add, since they increase cognitive load for reviewers and maintainers, increase the time to run the test files during development, CI, etc. Obviously we want good test coverage, but I think some of these tests seem unrelated to your new feature, e.g., the ones that add redundant check constraints in the schema. Is there a reason you think we need those here?
Sorry about that. I've pared down the number of tests and put some tests into opt/xform
. I left some tests in regional_by_row_query_behavior
just to verify correct execution. If you think it's still too much I can remove them.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2421 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
for this type of formatting, check out the
-rewrite-sql
option in the logic tests, which formats SQL statements according to this formatter: https://sqlfum.pt
Thanks, I reformatted the SQL in this file.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2433 at r1 (raw file):
-rewrite-sql
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2453 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: comments should be <= 80 columns wide
Fixed.
Code quote:
# Correlated semijoin with LIMIT in outer query block should not enable locality optimized scan.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2455 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why are you using explain (DISTSQL) here?
It's just a remnant from a copied test and I had left it in for some randomness, in case it shows different changes than regular EXPLAIN. I've removed it.
Code quote:
SELECT * FROM [EXPLAIN (DISTSQL) SELECT * FROM child WHERE EXISTS (SELECT * FROM parent WHERE p_id = c_p_id) LIMIT 3] OFFSET 2
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2481 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: comments should be 80 columns wide or less
Fixed.
Code quote:
# Uncorrelated semijoin with LIMIT can't enable locality optimized scan in the outer query block yet.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2809 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Maybe say why? Because the limit cannot be pushed into the scan, right?
Yes. Updated it to:
# Adding a filter disallows locality optimized scan because we can't push the
# LIMIT into the scan because it may result in returning less rows than
# requested by the LIMIT clause.
Code quote:
Adding a filter disallows locality optimized scan.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 3167 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why do you need to change the value of vectorize here? I think the test will automatically be run with both configurations
The kv trace results are slightly different between vectorized and non-vectorized, so results will only match with an explicit setting. I'm copying other examples in this test file with slight modification.
pkg/sql/opt/constraint/key.go, line 280 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'm hoping we can remove both of these methods (see comments elsewhere). They also seem a bit too specific for the
Key
type, which is otherwise quite general
Removed them.
pkg/sql/types/types.go, line 2748 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'd like to avoid using this type if possible inside the optimizer, since it means our optimizations won't work at all for customers using the old style of multi-region features (e.g. partitioning). It would be better to rely on the zone configs of the partitioning columns as I've done for the existing locality optimized search stuff.
Removed it. The latest code is a full rework plus some refactoring of the PrefixSorter to be accessible via the cat.Index, and only built once per query (or at least once per each building of optIndex). Currently I'm initializing the PrefixSorter whenever the optIndex struct is built, but I'm wondering if it would be better to build it lazily, the first time it's needed.
7f5259d
to
e69c477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking a lot better! As we discussed offline, I think the PrefixSorter
stuff should probably be moved out of cat
. The logic that doesn't rely on the constraint
library can be moved to its own new library (e.g., opt/partition
or something), and the resulting PrefixSorter
object can be cached in TableMeta
.
Otherwise I think most of my comments are pretty minor.
Reviewed 26 of 26 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)
pkg/sql/opt/table_meta.go, line 53 at r2 (raw file):
// IdxColumnID returns the "i"th ColumnID of the index with ordinal number // "index" of the table described by tm. Both index and i are zero-based. func (tm *TableMeta) IdxColumnID(index cat.IndexOrdinal, i int) ColumnID {
nit: I'd just call this IndexColumnID
, which should be fine since it's a different receiver type. Also, maybe move this method to the rest of the TableMeta
methods.
pkg/sql/opt/table_meta.go, line 57 at r2 (raw file):
tm.Table.Name() if i < 0 || i >= idx.ColumnCount() { panic(fmt.Sprintf("Attempt to lookup column %d in index %d of table %s. But it only has %d columns.",
I think we usually just rely on the array index out of bounds panic throughout this file
pkg/sql/opt/cat/index.go, line 200 at r2 (raw file):
// not have at least one local partition and one remote partition, then the // result is nil, false. PrefixSorter(evalCtx *tree.EvalContext) (*PrefixSorter, bool)
The catalog interfaces are designed to be simple interfaces to just what the optimizer needs from the catalog. This doesn't feel like it belongs, since it's a derived field, built from info that is already available from PartitionCount
and Partition
pkg/sql/opt/cat/utils.go, line 380 at r2 (raw file):
} const regionKey = "region"
All this stuff feels a bit too specific to be in cat
, and especially in the utils.go
file. Is there a reason you moved it out of the rule code?
Also, FYI, when you're moving big chunks of code like this, it helps reviewers if you create one commit for the code movement without making any additional changes, and then make changes in a follow-on commit. You can squash before merging.
Edit: I think I see why you moved the code -- looks like you're using it from the constraint library. In that case, I think this probably deserves to be in its own library with partitioning-related utilities. Perhaps you should create a new directory opt/partition
(or something like that) and put this stuff there. Also, please name it something other than utils.go
if possible so it's more obvious what's contained in the file.
pkg/sql/opt/cat/utils.go, line 450 at r2 (raw file):
nonInclusiveEndIndex := ps.idx[i] + 1 if (nonInclusiveEndIndex < inclusiveStartIndex) || (nonInclusiveEndIndex > len(ps.Entry)) { panic(fmt.Sprintf("Internal error finding partition prefix slice. inclusiveStartIndex = %d, nonInclusiveEndIndex = %d",
this should be panic(errors.AssertionFailedf(...
. You don't need to add the "internal error" part, since that will be automatically included in the error produced.
pkg/sql/opt/cat/utils.go, line 555 at r2 (raw file):
func HasMixOfLocalAndRemotePartitions( evalCtx *tree.EvalContext, index Index, ) (localPartitions *util.FastIntSet, ok bool) {
nit: FastIntSet
s can usually be passed around by value
pkg/sql/opt/constraint/constraint.go, line 473 at r2 (raw file):
} // makeLocalPartitionsConstraint builds a constraint with spans that cover the
This comment seems to be for dead code. Is it still relevant? Or would it be better to put this into a GitHub issue and remove it from here?
pkg/sql/opt/constraint/constraint.go, line 537 at r2 (raw file):
if indexHasLocalAndRemoteParts { // TODO(msirek): Enable span-based locality checking. //localPartitionsConstraint = c.makeLocalPartitionsConstraint(evalCtx, ps)
nit: better not to leave dead code if possible
pkg/sql/opt/constraint/constraint.go, line 545 at r2 (raw file):
} // TODO(msirek): Enable span-based locality checking. //if !lastSpanIsLocal {
ditto
pkg/sql/opt/constraint/constraint.go, line 561 at r2 (raw file):
} // TODO(msirek): Enable span-based locality checking. //if !spanIsLocal {
ditto
pkg/sql/opt/constraint/utils.go, line 93 at r2 (raw file):
return index + startIndex } continue
you don't need a continue
here
pkg/sql/opt/constraint/utils.go, line 120 at r2 (raw file):
// longest-length prefix possible, because that reflects the actual locality // of the span's owning range. func searchUnitLengthAndShorterPrefixes(spanDatum tree.Datum, ps *cat.PrefixSorter) int {
Feels like there's a lot of duplication here. Any way to create a helper function used by this function and searchPrefixes
?
pkg/sql/opt/xform/scan_funcs.go, line 234 at r2 (raw file):
// buildAllPartitionsConstraint retrieves the partition filters and in between // filters for the "index" belonging to the table described by "tabMeta", and // builds the full set spans covering both defined partitions and rows belonging
nit: set spans -> set of spans
pkg/sql/opt/xform/scan_funcs.go, line 240 at r2 (raw file):
// suboptimal spans may be produced which don't maximize the number of rows // accessed as a 100% local operation. //For example:
nit: add a space between // and For
pkg/sql/opt/xform/scan_funcs.go, line 298 at r2 (raw file):
checkConstraints := c.checkConstraintFilters(sp.Table) partitionFilters, inBetweenFilters := c.partitionValuesFilters(sp.Table, index)
I think you should probably use similar logic that we use in GenerateConstrainedScans
to avoid calling this function if we already have everything we need from the check constraint filters and computedColFilters
. That logic is here:
cockroach/pkg/sql/opt/xform/select_funcs.go
Lines 295 to 300 in a0da5e2
indexColumns := tabMeta.IndexKeyColumns(index.Ordinal()) | |
firstIndexCol := scanPrivate.Table.IndexColumnID(index, 0) | |
if !filterColumns.Contains(firstIndexCol) && indexColumns.Intersects(filterColumns) { | |
// Calculate any partition filters if appropriate (see below). | |
partitionFilters, inBetweenFilters = c.partitionValuesFilters(scanPrivate.Table, index) | |
} |
Also -- I'm not sure, but I think there might be cases where you'd want to include computedColFilters
, e.g., if the partitioning column is computed.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2264 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Sorry about that. I've pared down the number of tests and put some tests into
opt/xform
. I left some tests inregional_by_row_query_behavior
just to verify correct execution. If you think it's still too much I can remove them.
I think it's good to leave some tests here. Looks good to me.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 11 at r2 (raw file):
statement ok CREATE TABLE regional_by_row_table ( pk
nit: let's not completely reformat the whole file. The rewrite-sql
command is useful to see how we generally like to format SQL, but there's no need to rewrite the stuff that's not connected to your change. I'd revert most of the changes that aren't in your test cases.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 3246 at r2 (raw file):
a IN ( 1,
lol, this is one of those cases where I'm not convinced that the SQL formatter made the right choice. Feel free to consolidate these numbers back onto a couple of lines if you want.
pkg/sql/opt/constraint/constraint_test.go, line 534 at r2 (raw file):
s string // expected value e string
nit: I'd make these names a bit more self-documenting. If you don't want to spell out "expected", maybe "exp" would still be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/table_meta.go, line 53 at r2 (raw file):
IndexColumnID
Renamed it. There are no uses of this function in the latest revision.
pkg/sql/opt/table_meta.go, line 57 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think we usually just rely on the array index out of bounds panic throughout this file
Removed the sanity check.
pkg/sql/opt/constraint/constraint.go, line 473 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This comment seems to be for dead code. Is it still relevant? Or would it be better to put this into a GitHub issue and remove it from here?
Removed the dead code. Opened github issue #75887.
pkg/sql/opt/constraint/constraint.go, line 537 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: better not to leave dead code if possible
removed
pkg/sql/opt/constraint/constraint.go, line 545 at r2 (raw file):
removed
removed
pkg/sql/opt/constraint/constraint.go, line 561 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
removed
pkg/sql/opt/xform/scan_funcs.go, line 234 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: set spans -> set of spans
Done
pkg/sql/opt/xform/scan_funcs.go, line 240 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: add a space between // and For
Done.
Code quote:
to no defined partition (or partitions defined as DEFAULT). Partition spans
pkg/sql/opt/xform/scan_funcs.go, line 298 at r2 (raw file):
computedColFilters
OK, thanks. I've mirrored the logic inselect_funcs.go
, includingcomputedColFilters
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2264 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think it's good to leave some tests here. Looks good to me.
OK, thanks
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 11 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: let's not completely reformat the whole file. The
rewrite-sql
command is useful to see how we generally like to format SQL, but there's no need to rewrite the stuff that's not connected to your change. I'd revert most of the changes that aren't in your test cases.
OK, I've restored the formatting on pre-existing tests.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 3246 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
lol, this is one of those cases where I'm not convinced that the SQL formatter made the right choice. Feel free to consolidate these numbers back onto a couple of lines if you want.
Consolidated.
pkg/sql/opt/constraint/constraint_test.go, line 534 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I'd make these names a bit more self-documenting. If you don't want to spell out "expected", maybe "exp" would still be enough.
Done.
pkg/sql/opt/cat/index.go, line 200 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
The catalog interfaces are designed to be simple interfaces to just what the optimizer needs from the catalog. This doesn't feel like it belongs, since it's a derived field, built from info that is already available from
PartitionCount
andPartition
Removed it.
pkg/sql/opt/cat/utils.go, line 380 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
All this stuff feels a bit too specific to be in
cat
, and especially in theutils.go
file. Is there a reason you moved it out of the rule code?Also, FYI, when you're moving big chunks of code like this, it helps reviewers if you create one commit for the code movement without making any additional changes, and then make changes in a follow-on commit. You can squash before merging.
Edit: I think I see why you moved the code -- looks like you're using it from the constraint library. In that case, I think this probably deserves to be in its own library with partitioning-related utilities. Perhaps you should create a new directory
opt/partition
(or something like that) and put this stuff there. Also, please name it something other thanutils.go
if possible so it's more obvious what's contained in the file.
Refactored. I left details in the commit message.
pkg/sql/opt/cat/utils.go, line 450 at r2 (raw file):
panic(errors.AssertionFailedf
Fixed
pkg/sql/opt/cat/utils.go, line 555 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit:
FastIntSet
s can usually be passed around by value
But in the ok == false
case it may be better not to construct an object we're not going to use (minimize memory usage), and returning nil needs a pointer.
pkg/sql/opt/constraint/utils.go, line 93 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you don't need a
continue
here
removed
Code quote:
if compare(prefixSlice[index], span, ps) == 0 {
pkg/sql/opt/constraint/utils.go, line 120 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Feels like there's a lot of duplication here. Any way to create a helper function used by this function and
searchPrefixes
?
I've removed searchUnitLengthAndShorterPrefixes
, added another parameter to SearchPrefixes
and wrapped the datum to search for in a Span.
Code quote:
func searchUnitLengthAndShorterPrefixes(spanDatum tree.Datum, ps *cat.PrefixSorter) int {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 27 of 27 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)
pkg/sql/opt_catalog.go, line 655 at r3 (raw file):
stats []*stats.TableStatistic, tblZone *zonepb.ZoneConfig, evalCtx *tree.EvalContext,
do you still need this new evalCtx
param?
pkg/sql/opt_catalog.go, line 1217 at r3 (raw file):
partZones map[string]*zonepb.ZoneConfig, invertedColOrd int, evalCtx *tree.EvalContext,
ditto
pkg/sql/opt/table_meta.go, line 53 at r2 (raw file):
Previously, msirek (Mark Sirek) wrote…
IndexColumnID
Renamed it. There are no uses of this function in the latest revision.
If there are no uses of the function, I'd just remove it
pkg/sql/opt/table_meta.go, line 178 at r3 (raw file):
// indexPartitionLocalities is a map from an index ordinal on the table to a // *PrefixSorter representing the PARTITION BY LIST values of the index. If an // index is partitioned BY LIST, and has both local and remote partitions, it
nit: define local and remote (I know we do this in a lot of places, but I think it's helpful for people who don't have the context)
pkg/sql/opt/cat/locality.go, line 11 at r3 (raw file):
// licenses/APL.txt. package cat
This shouldn't be in cat
-- can you put it in the new partition
library? You could also name that library locality
if you prefer.
pkg/sql/opt/cat/locality.go, line 128 at r3 (raw file):
} foundLocal := false foundRemote := false
super nit: I would usually just default construct these, e.g. var foundLocal, foundRemote bool
pkg/sql/opt/constraint/locality.go, line 59 at r3 (raw file):
func searchPrefixes(span *Span, ps *partition.PrefixSorter, prefixSearchUpperBound int) int { if prefixSearchUpperBound <= 0 { prefixSearchUpperBound = math.MaxInt32
nit: since you're just going to reset this from -1 to MaxInt32, why not just pass MaxInt32 directly?
pkg/sql/opt/testutils/testcat/test_catalog.go, line 987 at r3 (raw file):
// SetPartitions manually sets the partitions. func (ti *Index) SetPartitions(partitions []Partition) {
Why do you need this function?
pkg/sql/opt/testutils/testcat/test_catalog.go, line 1016 at r3 (raw file):
// SetDatums manually sets the partitioning values. func (p *Partition) SetDatums(datums []tree.Datums) {
ditto
pkg/sql/opt/xform/scan_funcs.go, line 298 at r2 (raw file):
Previously, msirek (Mark Sirek) wrote…
computedColFilters
OK, thanks. I've mirrored the logic inselect_funcs.go
, includingcomputedColFilters
Doesn't look like you've exactly mirrored that logic. Why not just copy it exactly? Or better yet, create a helper function that can be called by both. I might put this whole block into a helper function, which takes as params index
, filters
, and optionalFilters
:
cockroach/pkg/sql/opt/xform/select_funcs.go
Lines 232 to 336 in a0da5e2
// We only consider the partition values when a particular index can otherwise | |
// not be constrained. For indexes that are constrained, the partitioned values | |
// add no benefit as they don't really constrain anything. | |
// Furthermore, if the filters don't take advantage of the index (use any of the | |
// index columns), using the partition values add no benefit. | |
// | |
// If the index is partitioned (by list), we generate two constraints and | |
// union them: the "main" constraint and the "in-between" constraint.The | |
// "main" constraint restricts the index to the known partition ranges. The | |
// "in-between" constraint restricts the index to the rest of the ranges | |
// (i.e. everything that falls in-between the main ranges); the in-between | |
// constraint is necessary for correctness (there can be rows outside of the | |
// partitioned ranges). | |
// | |
// For both constraints, the partition-related filters are passed as | |
// "optional" which guarantees that they return no remaining filters. This | |
// allows us to merge the remaining filters from both constraints. | |
// | |
// Consider the following index and its partition: | |
// | |
// CREATE INDEX orders_by_seq_num | |
// ON orders (region, seq_num, id) | |
// STORING (total) | |
// PARTITION BY LIST (region) | |
// ( | |
// PARTITION us_east1 VALUES IN ('us-east1'), | |
// PARTITION us_west1 VALUES IN ('us-west1'), | |
// PARTITION europe_west2 VALUES IN ('europe-west2') | |
// ) | |
// | |
// The constraint generated for the query: | |
// SELECT sum(total) FROM orders WHERE seq_num >= 100 AND seq_num < 200 | |
// is: | |
// [/'europe-west2'/100 - /'europe-west2'/199] | |
// [/'us-east1'/100 - /'us-east1'/199] | |
// [/'us-west1'/100 - /'us-west1'/199] | |
// | |
// The spans before europe-west2, after us-west1 and in between the defined | |
// partitions are missing. We must add these spans now, appropriately | |
// constrained using the filters. | |
// | |
// It is important that we add these spans after the partition spans are generated | |
// because otherwise these spans would merge with the partition spans and would | |
// disallow the partition spans (and the in between ones) to be constrained further. | |
// Using the partitioning example and the query above, if we added the in between | |
// spans at the same time as the partitioned ones, we would end up with a span that | |
// looked like: | |
// [ - /'europe-west2'/99] | |
// | |
// Allowing the partition spans to be constrained further and then adding | |
// the spans give us a more constrained index scan as shown below: | |
// [ - /'europe-west2') | |
// [/'europe-west2'/100 - /'europe-west2'/199] | |
// [/e'europe-west2\x00'/100 - /'us-east1') | |
// [/'us-east1'/100 - /'us-east1'/199] | |
// [/e'us-east1\x00'/100 - /'us-west1') | |
// [/'us-west1'/100 - /'us-west1'/199] | |
// [/e'us-west1\x00'/100 - ] | |
// | |
// Notice how we 'skip' all the europe-west2 rows with seq_num < 100. | |
// | |
var partitionFilters, inBetweenFilters memo.FiltersExpr | |
indexColumns := tabMeta.IndexKeyColumns(index.Ordinal()) | |
firstIndexCol := scanPrivate.Table.IndexColumnID(index, 0) | |
if !filterColumns.Contains(firstIndexCol) && indexColumns.Intersects(filterColumns) { | |
// Calculate any partition filters if appropriate (see below). | |
partitionFilters, inBetweenFilters = c.partitionValuesFilters(scanPrivate.Table, index) | |
} | |
// Check whether the filter (along with any partitioning filters) can constrain the index. | |
constraint, remainingFilters, ok := c.tryConstrainIndex( | |
filters, | |
append(optionalFilters, partitionFilters...), | |
scanPrivate.Table, | |
index.Ordinal(), | |
) | |
if !ok { | |
return | |
} | |
if len(partitionFilters) > 0 { | |
inBetweenConstraint, inBetweenRemainingFilters, ok := c.tryConstrainIndex( | |
filters, | |
append(optionalFilters, inBetweenFilters...), | |
scanPrivate.Table, | |
index.Ordinal(), | |
) | |
if !ok { | |
panic(errors.AssertionFailedf("in-between filters didn't yield a constraint")) | |
} | |
constraint.UnionWith(c.e.evalCtx, inBetweenConstraint) | |
// Even though the partitioned constraints and the inBetween constraints | |
// were consolidated, we must make sure their Union is as well. | |
constraint.ConsolidateSpans(c.e.evalCtx) | |
// Add all remaining filters that need to be present in the | |
// inBetween spans. Some of the remaining filters are common | |
// between them, so we must deduplicate them. | |
remainingFilters = c.ConcatFilters(remainingFilters, inBetweenRemainingFilters) | |
remainingFilters.Sort() | |
remainingFilters.Deduplicate() | |
} |
We can call that helper function here with filters=nil
pkg/sql/opt/xform/scan_funcs.go, line 298 at r3 (raw file):
) (*constraint.Constraint, bool) { optionalFilters := c.checkConstraintFilters(sp.Table)
do you have tests for this logic? e.g. some tables with partitions matching REGIONAL BY ROW, and some tables without check constraints
pkg/sql/opt/idxconstraint/index_constraints_test.go, line 128 at r3 (raw file):
ic.Init( filters, optionalFilters, indexCols, sv.NotNullCols(), computedCols, true /* consolidate */, &evalCtx, &f, nil, /* ps */
nit: ps
doesn't really clarify what this param is. I'd either change the param name to prefixSorter
or just drop this comment
1d81020
to
154f11e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/table_meta.go, line 53 at r2 (raw file):
IndexColumnID
Done
pkg/sql/opt/table_meta.go, line 178 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: define local and remote (I know we do this in a lot of places, but I think it's helpful for people who don't have the context)
Added:
// Local partitions are those where all row ranges they own have a preferred
// region for leaseholder nodes the same as the gateway region of the current
// connection that is running the query.
// Remote partitions have at least one row range with a leaseholder preferred
// region which is different from the gateway region.
Code quote:
// indexPartitionLocalities is a map from an index ordinal on the table to a
pkg/sql/opt/constraint/locality.go, line 59 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: since you're just going to reset this from -1 to MaxInt32, why not just pass MaxInt32 directly?
Non-positives were chosen to mean unbounded for ease of use. MaxInt32
is just an internal detail that the caller need not be aware of. The only reason MaxInt32 is used internally is because it's more performant to have a single comparison in the if statement than to write it as two comparisons:if prefixSearchUpperBound >= 0 && len(prefixSlice[0].Prefix) > prefixSearchUpperBound
. But sometimes code clarity is more important than performance. I could change it to two comparisons if that seems more clear.
pkg/sql/opt/testutils/testcat/test_catalog.go, line 987 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Why do you need this function?
It's needed for faking partitioning in constraint_test.go
because GetSortedPrefixes
looks at the partitions in index
:
cockroach/pkg/sql/opt/constraint/constraint_test.go
Lines 615 to 619 in 1d81020
// Make the index | |
index := &testcat.Index{} | |
index.SetPartitions(partitions) | |
// Make the PrefixSorter. | |
ps := partition.GetSortedPrefixes(index, localPartitions, &evalCtx) |
Code quote:
func (ti *Index) SetPartitions(partitions []Partition) {
pkg/sql/opt/testutils/testcat/test_catalog.go, line 1016 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
This is needed for faking the PARTITION BY LIST values which are inspected by cat.Partition.PartitionByListPrefixes()
, called by GetSortedPrefixes
in constraint_test.go.
pkg/sql/opt/xform/scan_funcs.go, line 298 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Doesn't look like you've exactly mirrored that logic. Why not just copy it exactly? Or better yet, create a helper function that can be called by both. I might put this whole block into a helper function, which takes as params
index
,filters
, andoptionalFilters
:cockroach/pkg/sql/opt/xform/select_funcs.go
Lines 232 to 336 in a0da5e2
// We only consider the partition values when a particular index can otherwise // not be constrained. For indexes that are constrained, the partitioned values // add no benefit as they don't really constrain anything. // Furthermore, if the filters don't take advantage of the index (use any of the // index columns), using the partition values add no benefit. // // If the index is partitioned (by list), we generate two constraints and // union them: the "main" constraint and the "in-between" constraint.The // "main" constraint restricts the index to the known partition ranges. The // "in-between" constraint restricts the index to the rest of the ranges // (i.e. everything that falls in-between the main ranges); the in-between // constraint is necessary for correctness (there can be rows outside of the // partitioned ranges). // // For both constraints, the partition-related filters are passed as // "optional" which guarantees that they return no remaining filters. This // allows us to merge the remaining filters from both constraints. // // Consider the following index and its partition: // // CREATE INDEX orders_by_seq_num // ON orders (region, seq_num, id) // STORING (total) // PARTITION BY LIST (region) // ( // PARTITION us_east1 VALUES IN ('us-east1'), // PARTITION us_west1 VALUES IN ('us-west1'), // PARTITION europe_west2 VALUES IN ('europe-west2') // ) // // The constraint generated for the query: // SELECT sum(total) FROM orders WHERE seq_num >= 100 AND seq_num < 200 // is: // [/'europe-west2'/100 - /'europe-west2'/199] // [/'us-east1'/100 - /'us-east1'/199] // [/'us-west1'/100 - /'us-west1'/199] // // The spans before europe-west2, after us-west1 and in between the defined // partitions are missing. We must add these spans now, appropriately // constrained using the filters. // // It is important that we add these spans after the partition spans are generated // because otherwise these spans would merge with the partition spans and would // disallow the partition spans (and the in between ones) to be constrained further. // Using the partitioning example and the query above, if we added the in between // spans at the same time as the partitioned ones, we would end up with a span that // looked like: // [ - /'europe-west2'/99] // // Allowing the partition spans to be constrained further and then adding // the spans give us a more constrained index scan as shown below: // [ - /'europe-west2') // [/'europe-west2'/100 - /'europe-west2'/199] // [/e'europe-west2\x00'/100 - /'us-east1') // [/'us-east1'/100 - /'us-east1'/199] // [/e'us-east1\x00'/100 - /'us-west1') // [/'us-west1'/100 - /'us-west1'/199] // [/e'us-west1\x00'/100 - ] // // Notice how we 'skip' all the europe-west2 rows with seq_num < 100. // var partitionFilters, inBetweenFilters memo.FiltersExpr indexColumns := tabMeta.IndexKeyColumns(index.Ordinal()) firstIndexCol := scanPrivate.Table.IndexColumnID(index, 0) if !filterColumns.Contains(firstIndexCol) && indexColumns.Intersects(filterColumns) { // Calculate any partition filters if appropriate (see below). partitionFilters, inBetweenFilters = c.partitionValuesFilters(scanPrivate.Table, index) } // Check whether the filter (along with any partitioning filters) can constrain the index. constraint, remainingFilters, ok := c.tryConstrainIndex( filters, append(optionalFilters, partitionFilters...), scanPrivate.Table, index.Ordinal(), ) if !ok { return } if len(partitionFilters) > 0 { inBetweenConstraint, inBetweenRemainingFilters, ok := c.tryConstrainIndex( filters, append(optionalFilters, inBetweenFilters...), scanPrivate.Table, index.Ordinal(), ) if !ok { panic(errors.AssertionFailedf("in-between filters didn't yield a constraint")) } constraint.UnionWith(c.e.evalCtx, inBetweenConstraint) // Even though the partitioned constraints and the inBetween constraints // were consolidated, we must make sure their Union is as well. constraint.ConsolidateSpans(c.e.evalCtx) // Add all remaining filters that need to be present in the // inBetween spans. Some of the remaining filters are common // between them, so we must deduplicate them. remainingFilters = c.ConcatFilters(remainingFilters, inBetweenRemainingFilters) remainingFilters.Sort() remainingFilters.Deduplicate() } We can call that helper function here with
filters=nil
Created helper functions
pkg/sql/opt/xform/scan_funcs.go, line 298 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
do you have tests for this logic? e.g. some tables with partitions matching REGIONAL BY ROW, and some tables without check constraints
Added checking of query plans and rule firing for REGIONAL BY ROW, REGIONAL BY ROW AS, and tables without CHECK constraints.
pkg/sql/opt/idxconstraint/index_constraints_test.go, line 128 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit:
ps
doesn't really clarify what this param is. I'd either change the param name toprefixSorter
or just drop this comment
Changed to /* prefixSorter */
pkg/sql/opt_catalog.go, line 655 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
do you still need this new
evalCtx
param?
Removed
pkg/sql/opt_catalog.go, line 1217 at r3 (raw file):
Removed
Removed
pkg/sql/opt/cat/locality.go, line 11 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This shouldn't be in
cat
-- can you put it in the newpartition
library? You could also name that librarylocality
if you prefer.
OK, moved it to partition
and eliminated this file. This library is small right now, so I'll keep the name the same so maybe it'll start collecting more partition-related logic. If it gets too big in the future we could split off the locality piece.
pkg/sql/opt/cat/locality.go, line 128 at r3 (raw file):
var foundLocal, foundRemote bool
Done
@mgartner You may be interested in reviewing some of these interface updates (new helper functions in select_funcs.go, etc.) |
0d2d7da
to
68c25df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nits remaining. Nice job getting this across the finish line.
Reviewed 8 of 14 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rytaft)
pkg/sql/opt/constraint/locality.go, line 59 at r3 (raw file):
Previously, msirek (Mark Sirek) wrote…
Non-positives were chosen to mean unbounded for ease of use.
MaxInt32
is just an internal detail that the caller need not be aware of. The only reason MaxInt32 is used internally is because it's more performant to have a single comparison in the if statement than to write it as two comparisons:if prefixSearchUpperBound >= 0 && len(prefixSlice[0].Prefix) > prefixSearchUpperBound
. But sometimes code clarity is more important than performance. I could change it to two comparisons if that seems more clear.
Ok up to you -- I was personally confused and would have found it clearer to pass math.MaxInt32
in the first place, but it doesn't really matter either way since this is a private function (i.e. not exported)
pkg/sql/opt/xform/scan_funcs.go, line 210 at r5 (raw file):
localScanPrivate.SetConstraint(c.e.evalCtx, &localConstraint) localScanPrivate.HardLimit = scanPrivate.HardLimit localScanPrivate.PartitionConstrainedScan = true
Not convinced we want to be setting this boolean -- I think this was added for telemetry purposes for the specific optimization where we could use partitions to constrain the scan even if there were no check constraints.
pkg/sql/opt/xform/scan_funcs.go, line 219 at r5 (raw file):
remoteScanPrivate.SetConstraint(c.e.evalCtx, &remoteConstraint) remoteScanPrivate.HardLimit = scanPrivate.HardLimit remoteScanPrivate.PartitionConstrainedScan = true
ditto
pkg/sql/opt/xform/select_funcs.go, line 150 at r5 (raw file):
// MakeCombinedFiltersConstraint builds a constraint from explicitFilters, // optionalFilters and conditionally an IN list filter generated from the // index's PARTITION BY LIST values in both of these conditions are true:
nit: in -> if
pkg/sql/opt/xform/select_funcs.go, line 151 at r5 (raw file):
// optionalFilters and conditionally an IN list filter generated from the // index's PARTITION BY LIST values in both of these conditions are true: // 1) The first partitioning column is not referenced in optionalFilters AND
it's not just optional filters -- it should also include columns from explicit filters
So I think you should change the param optionalFiltersColumns
to filterColumns
68c25df
to
3350612
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the great comments. Very instructive
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/constraint/locality.go, line 59 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Ok up to you -- I was personally confused and would have found it clearer to pass
math.MaxInt32
in the first place, but it doesn't really matter either way since this is a private function (i.e. not exported)
OK, if you thought it's confusing, maybe someone else will too. I changed it to pass math.MaxInt32
. Since it is possible to match on the zero-length default partition, I'll change the interface to allow prefixSearchUpperBound to be 0 which will only match on the default partition.
pkg/sql/opt/xform/scan_funcs.go, line 210 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Not convinced we want to be setting this boolean -- I think this was added for telemetry purposes for the specific optimization where we could use partitions to constrain the scan even if there were no check constraints.
Got it. And it's true that the two scans together scan all partitions, so it's really not constrained. Removed these lines.
pkg/sql/opt/xform/scan_funcs.go, line 219 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
Removed.
pkg/sql/opt/xform/select_funcs.go, line 150 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: in -> if
changed
pkg/sql/opt/xform/select_funcs.go, line 151 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
it's not just optional filters -- it should also include columns from explicit filters
So I think you should change the param
optionalFiltersColumns
tofilterColumns
Good catch. This also affects the description of the function. Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 glad to help
Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r1, 1 of 26 files at r2, 11 of 27 files at r3, 5 of 14 files at r4, 1 of 3 files at r5, 1 of 3 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @msirek, and @rytaft)
pkg/sql/opt/table_meta.go, line 47 at r6 (raw file):
// given index. func (t TableID) IndexColumnID(index cat.Index, i int) ColumnID { return t.ColumnID(index.Column(i).Ordinal())
nit: revert the changes to this function
pkg/sql/opt/table_meta.go, line 324 at r6 (raw file):
ps = partition.GetSortedPrefixes(index, *localPartitions, evalCtx) } tm.AddIndexPartitionLocality(ord, ps)
Is there a reason to lazily populate this map? It might make sense to populate this during the optbuilder stage like we do with computed column and partial index predicate metadata:
cockroach/pkg/sql/opt/optbuilder/select.go
Line 594 in 0ba8455
b.addComputedColsForTable(tabMeta) |
pkg/sql/opt/partition/locality.go, line 11 at r6 (raw file):
// licenses/APL.txt. package partition
Would it be worthwhile to add unit tests for any of these functions?
pkg/sql/opt/partition/locality.go, line 26 at r6 (raw file):
// PrefixIsLocal contains a PARTITION BY LIST Prefix, a boolean indicating // whether the Prefix is from a local partition, and the name of the partition. type PrefixIsLocal struct {
nit: Would Prefix
be a better name? From other packages it seems like it would read nicely: partition.Prefix
.
pkg/sql/opt/partition/locality.go, line 44 at r6 (raw file):
idx []int // The set of local partitions
nit: Can you expand this comment. What do the numbers in this set reference? Is is indexes of elements in the Entry
field?
pkg/sql/opt/partition/locality.go, line 106 at r6 (raw file):
evalCtx *tree.EvalContext, index cat.Index, ) (localPartitions *util.FastIntSet, ok bool) { if index == nil || index.PartitionCount() < 2 {
nit: I don't think the check for index == nil
is required - that seems like it would be incorrect usage of this function.
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
if constraint.GetKey() != regionKey { // We only care about constraints on the region. return false /* isLocal */, false /* ok */
nit: probably don't need to label these return values here and below
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2309 at r6 (raw file):
1 ---- 1 1 2 3 ap-southeast-2
This test file doesn't run in tenant mode lke region_by_row
does (see b86022e).
So maybe we should move tests with logical query results there, and leave EXPLAIN and tracing tests here.
I think you can change this PR description to say "Fixes #64862", since this is the last remaining piece of work for that issue! |
3350612
to
ba7619a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/table_meta.go, line 47 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: revert the changes to this function
Done
Code quote:
IndexColumnID(ind
pkg/sql/opt/table_meta.go, line 324 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is there a reason to lazily populate this map? It might make sense to populate this during the optbuilder stage like we do with computed column and partial index predicate metadata:
cockroach/pkg/sql/opt/optbuilder/select.go
Line 594 in 0ba8455
b.addComputedColsForTable(tabMeta)
The idea is just to avoid the extra processing needed to build the map until its required. For example, if we're not processing a Select, the map might not be required if that operation doesn't deal with all indexes. I've added populateIndexPartitionLocalitiesForTable here to populate the map for Select statements.
pkg/sql/opt/partition/locality.go, line 11 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Would it be worthwhile to add unit tests for any of these functions?
Verifying valid PrefixSorter
creation and returning the correct Slice()
may be good. Added tests for these.
pkg/sql/opt/partition/locality.go, line 26 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Would
Prefix
be a better name? From other packages it seems like it would read nicely:partition.Prefix
.
changed PrefixIsLocal
-> Prefix
pkg/sql/opt/partition/locality.go, line 44 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Can you expand this comment. What do the numbers in this set reference? Is is indexes of elements in the
Entry
field?
Added:
// The set of ordinal numbers indexing into the Entry slice, representing
// which Prefixes (partitions) are 100% local to the gateway region
Code quote:
// The set of local partitions
pkg/sql/opt/partition/locality.go, line 106 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: I don't think the check for
index == nil
is required - that seems like it would be incorrect usage of this function.
Removed the index == nil
check
Code quote:
HasMixOfLocalAndRemotePartitions
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: probably don't need to label these return values here and below
Removed the labels
Code quote:
isConstraintLocal(
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2309 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This test file doesn't run in tenant mode lke
region_by_row
does (see b86022e).So maybe we should move tests with logical query results there, and leave EXPLAIN and tracing tests here.
Moved the query results tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated the PR description.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
8f12640
to
2526c12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! after addressing some minor nits
Reviewed 1 of 3 files at r5, 2 of 3 files at r6, 8 of 12 files at r7, 8 of 8 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @msirek, and @rytaft)
pkg/sql/opt/table_meta.go, line 324 at r6 (raw file):
Previously, msirek (Mark Sirek) wrote…
The idea is just to avoid the extra processing needed to build the map until its required. For example, if we're not processing a Select, the map might not be required if that operation doesn't deal with all indexes. I've added populateIndexPartitionLocalitiesForTable here to populate the map for Select statements.
I see. I don't see any need for the addition of populateIndexPartitionLocalitiesForTable
, I was just curious why this diverged. I suppose there could be similar cases where we don't need to build partial index predicates, but those cases probably aren't all that important.
pkg/sql/opt/table_meta.go, line 324 at r8 (raw file):
ps = partition.GetSortedPrefixes(index, *localPartitions, evalCtx) } tm.AddIndexPartitionLocality(ord, ps)
nit: add a comment to explain why we cache a nil ps. Also, handling the successful cache lookup case first may be more idiomatic, but up to you:
if ps, ok = tm.indexPartitionLocalities[ord]; ok {
return ps, ps != nil
}
if localPartition, ok := parition.HasMixOfLocalAndRemotePartitions(..); ok {
ps = partition.GetSortedPrefixes(index, *localPartitions, evalCtx)
}
// Cache ps even if it is nil to avoid extra work when the partitioning scheme
// does not have a mix of local and remote partitions.
tm.AddIndexParitionLocality(ord, ps)
return ps, ps != nil
pkg/sql/opt/optbuilder/select.go, line 721 at r8 (raw file):
// populateIndexPartitionLocalitiesForTable populates the map in tabMeta // describing the localities of partitions in partitioned indexes. func (b *Builder) populateIndexPartitionLocalitiesForTable(tabMeta *opt.TableMeta) {
nit: I don't think this is necessary, let's remove it. See my other comment.
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, msirek (Mark Sirek) wrote…
Removed the labels
Did you forget to push this change? I still see the labels.
pkg/sql/opt/partition/prefixSorter_test.go, line 1 at r8 (raw file):
// Copyright 2022 The Cockroach Authors.
nit: typically test file names are named after the files of the tested code, in this case the file name would be locality_test.go
pkg/sql/opt/partition/prefixSorter_test.go, line 34 at r8 (raw file):
testData := []struct { // Partition Keys // The space-separated keys containing the PARTITION BY LIST values
nit: end comment sentences with a period here and below.
pkg/sql/opt/partition/prefixSorter_test.go, line 37 at r8 (raw file):
partitionKeys string // Partition Localities: true == local, false == remote
nit: the tests might be more readable if you assign some constants like const local = true
and const remote = false
.
2526c12
to
e8d9eb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/table_meta.go, line 324 at r6 (raw file):
populateIndexPartitionLocalitiesForTable
OK, removed it.
pkg/sql/opt/table_meta.go, line 324 at r8 (raw file):
ps != nil
Added:
// A nil ps means that the entry in the map for this index indicates that the
// index was not partitioned, or the index did not have a mix of local and
// remote partitions, so no PrefixSorter is built for it. We return ok=false
// to the caller to indicate no PrefixSorter is available for this index.
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Did you forget to push this change? I still see the labels.
Thanks, I must have had some code blocks collapsed in Goland. Removed all the labels now.
Code quote:
isConstraintLocal(constraint cat.Constraint, localRegion string) (isLocal bool, ok bool) {
pkg/sql/opt/optbuilder/select.go, line 721 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: I don't think this is necessary, let's remove it. See my other comment.
Removed
Code quote:
populateIndexPartitionLocalitiesForTable
pkg/sql/opt/partition/prefixSorter_test.go, line 1 at r8 (raw file):
…pkg/sql/opt/partition/prefixSorter_test.go
Thanks, renamed it.
pkg/sql/opt/partition/prefixSorter_test.go, line 34 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: end comment sentences with a period here and below.
Typically titles and phrases that aren't complete sentences don't end in a period.
Code quote:
// Partition Localities: true == local, false == remote
pkg/sql/opt/partition/prefixSorter_test.go, line 37 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: the tests might be more readable if you assign some constants like
const local = true
andconst remote = false
.
Done
e8d9eb1
to
22bb9c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 12 files at r7, 1 of 8 files at r8, 12 of 12 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @msirek, and @rytaft)
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, msirek (Mark Sirek) wrote…
Thanks, I must have had some code blocks collapsed in Goland. Removed all the labels now.
I kind of like the labels to distinguish between the two bools 🤷♀️ but I won't stand in the way
pkg/sql/opt/partition/locality.go, line 34 at r9 (raw file):
} // String returns a string representation of the Prefix, e.g. [/1]
nit: I wouldn't bother giving an example unless it matches what would really be returned. You'll always have "local" or "remote" included here. But honestly, I don't think String
usually needs any comment at all (although I guess it's not bad to include one).
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file): Previously, rytaft (Rebecca Taft) wrote…
Instead of just returning, you could also directly assign the return values and return at the bottom, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r9, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rytaft)
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
I kind of like the labels to distinguish between the two bools 🤷♀️ but I won't stand in the way
That's fair. I shouldn't have been so nitpicky, sorry. Do whatever you prefer here Mark.
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
22bb9c3
to
2fdd9ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I kind of like the labels to distinguish between the two bools 🤷♀️ but I won't stand in the way
That's fair. I shouldn't have been so nitpicky, sorry. Do whatever you prefer here Mark.
No worries. I received a review comment before that we don't typically assign return values and return at the end, so I'm not sure what to do in this regard. Anyway, I just restored the code to what it was before (this was a pre-existing function that I just moved).
pkg/sql/opt/partition/locality.go, line 34 at r9 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I wouldn't bother giving an example unless it matches what would really be returned. You'll always have "local" or "remote" included here. But honestly, I don't think
String
usually needs any comment at all (although I guess it's not bad to include one).
Thanks, added some correct examples.
Code quote:
// String returns a string representation of the Prefix, e.g. [/1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/partition/locality.go, line 191 at r6 (raw file):
Previously, msirek (Mark Sirek) wrote…
No worries. I received a review comment before that we don't typically assign return values and return at the end, so I'm not sure what to do in this regard. Anyway, I just restored the code to what it was before (this was a pre-existing function that I just moved).
Hmm not sure what that review comment was (maybe it was from me?) but there's definitely a gray area here. Anyway, this looks good to me.
bors r+ |
Build failed (retrying...): |
Build succeeded: |
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.
The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.
This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.
Fixes #64862
Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.