-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiutil, roachpb: create utilities to map descriptors to ranges #133840
apiutil, roachpb: create utilities to map descriptors to ranges #133840
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
// RangeToTableSpans is a simple utility function which converts a set of ranges | ||
// to a set of spans bound to the codec's SQL table space, and removed if the bound | ||
// span is zero length. | ||
func RangesToTableSpans(codec keys.SQLCodec, ranges []roachpb.RangeDescriptor) []roachpb.Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility 1: A function which turns a range into a span, and clamps it
to its tenant's table space.
|
||
// SpansToOrderedTableDescriptors uses the transaction's collection to turn a set of | ||
// spans to a set of descriptors which describe the table space in which those spans lie. | ||
func SpansToOrderedTableDescriptors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility 2: A function which takes the above spans and uses the catalog
and new descriptor by span utility to turn those spans into a set of
table descriptors ordered by id (no tests for this func).
// database, table, index combinations within. It assumes that every table | ||
// has at least one index, the descriptors input are ordered, and that | ||
// there can be duplicates of the descriptors. | ||
func TableDescriptorsToIndexNames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility 3: A function which transforms those table descriptors into a
set of (database, table, index) names which deduplicate and identify
each index uniquely.
// MapRangesToIndexes is a utility function which iterates over two lists, | ||
// one consisting of ordered ranges, and the other consisting of ordered index names | ||
// and outputs a mapping from range to index. | ||
func MapRangesToIndexes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility 4: A function, which merges the ranges and indexes into a map
keyed by RangeID whose values are the above index names.
// 2. Get the table descriptors that fall within the given spans. | ||
// 3. Get the database, table and index name for all indexes found in the descriptors. | ||
// 4. Return a mapping of the indexes which appear in each range. | ||
func GetRangeIndexMapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary Entrypoint for callers: A function in which a set of ranges can be
passed in and a mapping from those ranges to indexes can be
returned.
// into an upper bound descriptor ID. It does this by turning the key into a | ||
// descriptor ID, and then moving it upwards if and only if it is not the prefix | ||
// of the current index / table span. | ||
func getDescriptorIDFromExclusiveKey(codec keys.SQLCodec, key roachpb.Key) (uint32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
@@ -203,6 +205,24 @@ func TestDataDriven(t *testing.T) { | |||
return cr.GetByNames(ctx, txn, nis) | |||
} | |||
return h.doCatalogQuery(ctx, q) | |||
case "scan_descriptors_in_span": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
@@ -217,6 +237,58 @@ type testHelper struct { | |||
ucr, ccr catkv.CatalogReader | |||
} | |||
|
|||
func (h testHelper) parseSpanFromArgKey(argkey string) roachpb.Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
@@ -16,6 +16,18 @@ COMMENT ON TABLE kv IS 'this is a table'; | |||
COMMENT ON INDEX mv@idx IS 'this is an index'; | |||
COMMENT ON CONSTRAINT ck ON kv IS 'this is a check constraint'; | |||
COMMENT ON CONSTRAINT kv_pkey ON kv IS 'this is a primary key constraint'; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
@@ -16,6 +16,18 @@ COMMENT ON TABLE kv IS 'this is a table'; | |||
COMMENT ON INDEX mv@idx IS 'this is an index'; | |||
COMMENT ON CONSTRAINT ck ON kv IS 'this is a check constraint'; | |||
COMMENT ON CONSTRAINT kv_pkey ON kv IS 'this is a primary key constraint'; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
pkg/sql/catalog/descs/collection.go
Outdated
@@ -743,6 +743,13 @@ func (tc *Collection) GetAll(ctx context.Context, txn *kv.Txn) (nstree.Catalog, | |||
return ret.Catalog, nil | |||
} | |||
|
|||
// GetDescriptorsInSpans returns all descriptors within a given span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a part of this PR
0b83d7d
to
109219d
Compare
109219d
to
1c3a347
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, 9 of 9 files at r3, 10 of 11 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)
-- commits
line 134 at r4:
nit: replace tabs with spaces.
-- commits
line 145 at r4:
💯 thank you for this description.
-- commits
line 147 at r4:
nit: "caveat"
nit: what is "it" here?
-- commits
line 154 at r4:
doesn't need to be part of the PR, but should we return some list of "unexplained spans" when someone asks about a range?
pkg/roachpb/data.go
line 224 at r4 (raw file):
func (k Key) Less(b Key) bool { return bytes.Compare(k, b) == -1 }
You have a compare on keys here:
Lines 215 to 219 in 07423f4
// Compare compares the two Keys. | |
func (k Key) Compare(b Key) int { | |
return bytes.Compare(k, b) | |
} |
pkg/roachpb/metadata.go
line 1010 at r4 (raw file):
} type RangeDescriptorsByStartKey []RangeDescriptor
I don't see where this is used.
pkg/roachpb/metadata.go
line 1022 at r4 (raw file):
tmp := r[i] r[i] = r[j] r[j] = tmp
you can swap on one line via r[i], r[j] = r[j], r[i]
pkg/server/apiutil/rangeutil.go
line 20 at r4 (raw file):
) type IndexNames struct {
nit: docstring. I'm mostly curious why there's no construct already that combines the db/table/index names together.
pkg/server/apiutil/rangeutil.go
line 44 at r4 (raw file):
txn descs.Txn, codec keys.SQLCodec, databases map[descpb.ID]catalog.DatabaseDescriptor,
it's expected that the caller can produce this map?
pkg/server/apiutil/rangeutil.go
line 54 at r4 (raw file):
} indexes, err := TableDescriptorsToIndexNames(codec, databases, tables)
This part's a bit confusing to me. If a span maps to a table descriptor then we'll grab all the indexes from that table? We can't definitively state whether one or the other is in the span?
AHH I'm keeping this comment because I think I understand now: we produce the Span
field on every index and then intersect it with the spans from the ranges. Is that right?
pkg/server/apiutil/rangeutil.go
line 72 at r4 (raw file):
flushToResults := func(rangeID roachpb.RangeID) { results[rangeID] = contents contents = []IndexNames{}
There's a new clear
builtin you can use to do this without allocating a new slice: https://pkg.go.dev/builtin#clear
pkg/server/apiutil/rangeutil.go
line 124 at r4 (raw file):
// spans to a set of descriptors which describe the table space in which those spans lie. func SpansToOrderedTableDescriptors( ctx context.Context, txn descs.Txn, codec keys.SQLCodec, spans []roachpb.Span,
It appears that codec
is unused in this function.
pkg/server/apiutil/rangeutil.go
line 151 at r4 (raw file):
tables []catalog.TableDescriptor, ) ([]IndexNames, error) { seen := map[string]bool{}
you can use struct{}
as the value type in the map to avoid allocating booleans, since you never read them.
pkg/server/apiutil/rangeutil.go
line 178 at r4 (raw file):
func spanFromRange(rangeDesc roachpb.RangeDescriptor) roachpb.Span { return rangeDesc.KeySpan().AsRawSpanWithNoLocals()
I think one-liner functions like this make the code harder to read. I'd just inline this.
pkg/sql/catalog/internal/catkv/catalog_reader.go
line 168 at r2 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
Not a part of this PR
No need for per-file comments like this. You can just have a note in the PR description if you want the reviewer to skip a commit because it's part of a stack that's being reviewed separately. In Reviewable, I can easily filter the diffs by commit.
pkg/roachpb/key_test.go
line 31 at r4 (raw file):
{"low tenant codec gets clamped to lower bound", lowTp(5), tp(1), tp(10), tp(1)}, {"high tenant codec gets clamped to upper bound", highTp(5), tp(1), tp(10), tp(10)}, {"system codec im not sure", sysTp(5), tp(1), tp(10), tp(1)},
I think system
tenant should be unchanged if the clamp a/b are also in system tenant, otherwise clamp to the start key.
pkg/roachpb/span_test.go
line 4 at r4 (raw file):
// // Use of this software is governed by the CockroachDB Software License // included in the /LICENSE file.
Any reason why you're not using the keys_test.go
file? I think if the code is in X the tests should be in X_test.
pkg/server/apiutil/rangeutil_test.go
line 76 at r4 (raw file):
} } }
there might be some utils in the require
package that can make these checks easier.x
694b4dc
to
0ee1828
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
Previously, dhartunian (David Hartunian) wrote…
nit: replace tabs with spaces.
done.
Previously, dhartunian (David Hartunian) wrote…
💯 thank you for this description.
np!
Previously, dhartunian (David Hartunian) wrote…
nit: "caveat"
nit: what is "it" here?
👀 "caveat"
"It" is "this approach", is there a clearer way to describe that this approach has drawbacks?
Previously, dhartunian (David Hartunian) wrote…
doesn't need to be part of the PR, but should we return some list of "unexplained spans" when someone asks about a range?
We could, I'll create a ticket if it sounds like something we want to support.
pkg/roachpb/data.go
line 224 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
You have a compare on keys here:
Lines 215 to 219 in 07423f4
// Compare compares the two Keys. func (k Key) Compare(b Key) int { return bytes.Compare(k, b) }
Yeah, not sure how this got here, I was intending on using that function. Updated to use the key func.
pkg/server/apiutil/rangeutil.go
line 20 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: docstring. I'm mostly curious why there's no construct already that combines the db/table/index names together.
I'm not sure if there is or not - it didn't seem like in the protobuf there was. I asked in the #ask-an-engineer to see if anyone had a concept for this construct.
https://cockroachlabs.slack.com/archives/CHVV403F0/p1729004210052159
pkg/server/apiutil/rangeutil.go
line 44 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
it's expected that the caller can produce this map?
Kind of, the the last PR created a GetAllDatabaseDescriptorsMap
function on the collection - it's assumed that the caller will do the following:
Create that object so it's only read once.
Order the range descriptors once.
Process them in ordered batches by passing the above object in.
pkg/server/apiutil/rangeutil.go
line 54 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
This part's a bit confusing to me. If a span maps to a table descriptor then we'll grab all the indexes from that table? We can't definitively state whether one or the other is in the span?
AHH I'm keeping this comment because I think I understand now: we produce the
Span
field on every index and then intersect it with the spans from the ranges. Is that right?
That's exactly correct, since the tables
slice is a set of tables, of whom have a table space inclusive of all indexes, we apply this translation function to get an ordered list of indexes (which is at a finer span granularity than the input tables
slice) and compute their spans as part of that translation.
The latter part of this, does the actual mapping from range to index.
pkg/server/apiutil/rangeutil.go
line 72 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
There's a new
clear
builtin you can use to do this without allocating a new slice: https://pkg.go.dev/builtin#clear
oh nice, changed.
actually, I can't do this. clear somehow garbage collects the contents, even if they're still referenced by other objects in the application.
pkg/server/apiutil/rangeutil.go
line 124 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
It appears that
codec
is unused in this function.
Huh, good catch. Removed.
pkg/server/apiutil/rangeutil.go
line 151 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
you can use
struct{}
as the value type in the map to avoid allocating booleans, since you never read them.
changed.
pkg/server/apiutil/rangeutil.go
line 178 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I think one-liner functions like this make the code harder to read. I'd just inline this.
agreed, this is more of a pass through than anything, removed.
pkg/roachpb/key_test.go
line 31 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I think
system
tenant should be unchanged if the clamp a/b are also in system tenant, otherwise clamp to the start key.
This test is mostly to check how the system codec behaves wrt a tenant codec. I meant to update the test string, I'll change that now.
pkg/roachpb/span_test.go
line 4 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Any reason why you're not using the
keys_test.go
file? I think if the code is in X the tests should be in X_test.
X is data in this case. I felt like data_test.go
was a bit of a dumping ground for this package, which is why this PR adds keys_test
and span_test
pkg/server/apiutil/rangeutil_test.go
line 76 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
there might be some utils in the
require
package that can make these checks easier.x
👀 added some, for checks on the IndexNames like this one however - this approach is still favorable:
pkg/roachpb/metadata.go
line 1010 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I don't see where this is used.
This was originally used in the next PR, but has been scrapped in favor of this comparator function:
pkg/roachpb/metadata.go
line 1022 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
you can swap on one line via
r[i], r[j] = r[j], r[i]
removing this.
0ee1828
to
d0e2005
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one request for yet another docstring.
Reviewed 1 of 2 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)
Previously, angles-n-daemons (Brian Dillmann) wrote…
👀 "caveat"
"It" is "this approach", is there a clearer way to describe that this approach has drawbacks?
Ah I get it, all good. I must have spaced out between the sentences.
Previously, angles-n-daemons (Brian Dillmann) wrote…
We could, I'll create a ticket if it sounds like something we want to support.
Nah, we can leave it for now.
pkg/server/apiutil/rangeutil.go
line 20 at r4 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
I'm not sure if there is or not - it didn't seem like in the protobuf there was. I asked in the #ask-an-engineer to see if anyone had a concept for this construct.
https://cockroachlabs.slack.com/archives/CHVV403F0/p1729004210052159
I like the comment you added so def keep it, but also just add a docstring to the exported struct to explain its purpose.
pkg/server/apiutil/rangeutil.go
line 44 at r4 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
Kind of, the the last PR created a
GetAllDatabaseDescriptorsMap
function on the collection - it's assumed that the caller will do the following:Create that object so it's only read once.
Order the range descriptors once.
Process them in ordered batches by passing the above object in.
👍
pkg/server/apiutil/rangeutil.go
line 72 at r4 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
oh nice, changed.
actually, I can't do this. clear somehow garbage collects the contents, even if they're still referenced by other objects in the application.
I don't think clear
will garbage collect. I don't think you can trigger GC at will in golang. Anyway, fine to leave as-is. We can investigate/discuss separately.
pkg/roachpb/span_test.go
line 4 at r4 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
X is data in this case. I felt like
data_test.go
was a bit of a dumping ground for this package, which is why this PR addskeys_test
andspan_test
ah! makes sense. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, forgot about that! adding it now
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
Previously each range correlated to a single table, or even a single index in a database, so all that was required to identify which tables, indexes were in the range were to look at the start key of the range and map it accordingly. With range coalescing however, it's possible for one, or many, tables, indexes and the like to reside within the same range. To properly identify the contents of a range, this PR adds the following utilities: 1. A utility function which turns a range into a span, and clamps it to its tenant's table space. 2. A utility function which takes the above spans and uses the catalog and new descriptor by span utility to turn those spans into a set of table descriptors ordered by id. 3. A utility function which transforms those table descriptors into a set of (database, table, index) names which deduplicate and identify each index uniquely. 4. A utility function, which merges the ranges and indexes into a map keyed by RangeID whose values are the above index names. 5. A primary entrypoint for consumers from which a set of ranges can be passed in and a mapping from those ranges to indexes can be returned. A variety of caveats come with this approach. It attempts to scan the desciptors all at once, but it still will scan a sizable portion of the descriptors table if the request is large enough. This makes no attempt to describe system information which does not have a descriptor. It will describe system tables which appear in the descriptors table, but it will not try to explain "tables" which do not have descriptors (example tsdb), or any other information stored in the keyspace without a descriptor (PseudoTableIDs, GossipKeys for example). Throughout this work, many existing utilities were duplicated, and then un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you see anything that seems to already exist, feel free to point it out accordingly. Epic: CRDB-43151 Fixes: cockroachdb#130997 Release note: None
d0e2005
to
37f422d
Compare
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @angles-n-daemons, @dhartunian, and @kyle-a-wong)
pkg/sql/catalog/descs/collection.go
line 1152 at r8 (raw file):
} // but as a map with the database ID as the key.
nit: what happened to the comment here?
apiutil, roachpb: create utilities to map descriptors to ranges
Previously each range correlated to a single table, or even a single
index in a database, so all that was required to identify which tables,
indexes were in the range were to look at the start key of the range and
map it accordingly.
With range coalescing however, it's possible for one, or many tables,
indexes and the like to reside within the same range. To properly
identify the contents of a range, this PR adds the following utilities:
to its tenant's table space.
and new descriptor by span utility to turn those spans into a set of
table descriptors ordered by id.
set of (database, table, index) names which deduplicate and identify
each index uniquely.
keyed by RangeID whose values are the above index names.
passed in and a mapping from those ranges to indexes can be
returned.
A variety of caveats come with this approach. It attempts to scan the
desrciptors all at once, but it still will scan a sizable portion of the
descriptors table if the request is large enough. This makes no attempt
to describe system information which does not have a descriptor. It will
describe system tables which appear in the descriptors table, but it
will not try to explain "tables" which do not have descriptors (example
tsdb), or any other information stored in the keyspace without a
descriptor (PseudoTableIDs, GossipKeys for example).
Throughout this work, many existing utilities were duplicated, and then
un-duplicated (
keys.TableDataMin
,roachpb.Span.Overlap
, etc). If yousee anything that seems to already exist, feel free to point it out
accordingly.
Epic: CRDB-43151
Fixes: #130997
Release note: None