Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiutil, roachpb: create utilities to map descriptors to ranges #133840

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

angles-n-daemons
Copy link
Contributor

@angles-n-daemons angles-n-daemons commented Oct 30, 2024

apiutil, roachpb: create utilities to map descriptors to ranges

Previously each range correlated to a single table, or even a single
index in a database, so all that was required to identify which tables,
indexes were in the range were to look at the start key of the range and
map it accordingly.

With range coalescing however, it's possible for one, or many tables,
indexes and the like to reside within the same range. To properly
identify the contents of a range, this PR adds the following utilities:

  1. A utility function which turns a range into a span, and clamps it
    to its tenant's table space.
  2. A utility function which takes the above spans and uses the catalog
    and new descriptor by span utility to turn those spans into a set of
    table descriptors ordered by id.
  3. A utility function which transforms those table descriptors into a
    set of (database, table, index) names which deduplicate and identify
    each index uniquely.
  4. A utility function, which merges the ranges and indexes into a map
    keyed by RangeID whose values are the above index names.
  5. A primary entrypoint for consumers from which a set of ranges can be
    passed in and a mapping from those ranges to indexes can be
    returned.

A variety of caveats come with this approach. It attempts to scan the
desrciptors all at once, but it still will scan a sizable portion of the
descriptors table if the request is large enough. This makes no attempt
to describe system information which does not have a descriptor. It will
describe system tables which appear in the descriptors table, but it
will not try to explain "tables" which do not have descriptors (example
tsdb), or any other information stored in the keyspace without a
descriptor (PseudoTableIDs, GossipKeys for example).

Throughout this work, many existing utilities were duplicated, and then
un-duplicated (keys.TableDataMin, roachpb.Span.Overlap, etc). If you
see anything that seems to already exist, feel free to point it out
accordingly.

Epic: CRDB-43151
Fixes: #130997

Release note: None

@angles-n-daemons angles-n-daemons requested review from a team as code owners October 30, 2024 14:23
Copy link

blathers-crl bot commented Oct 30, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

// RangeToTableSpans is a simple utility function which converts a set of ranges
// to a set of spans bound to the codec's SQL table space, and removed if the bound
// span is zero length.
func RangesToTableSpans(codec keys.SQLCodec, ranges []roachpb.RangeDescriptor) []roachpb.Span {
Copy link
Contributor Author

@angles-n-daemons angles-n-daemons Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility 1: A function which turns a range into a span, and clamps it
to its tenant's table space.


// SpansToOrderedTableDescriptors uses the transaction's collection to turn a set of
// spans to a set of descriptors which describe the table space in which those spans lie.
func SpansToOrderedTableDescriptors(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility 2: A function which takes the above spans and uses the catalog
and new descriptor by span utility to turn those spans into a set of
table descriptors ordered by id (no tests for this func).

// database, table, index combinations within. It assumes that every table
// has at least one index, the descriptors input are ordered, and that
// there can be duplicates of the descriptors.
func TableDescriptorsToIndexNames(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility 3: A function which transforms those table descriptors into a
set of (database, table, index) names which deduplicate and identify
each index uniquely.

// MapRangesToIndexes is a utility function which iterates over two lists,
// one consisting of ordered ranges, and the other consisting of ordered index names
// and outputs a mapping from range to index.
func MapRangesToIndexes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility 4: A function, which merges the ranges and indexes into a map
keyed by RangeID whose values are the above index names.

// 2. Get the table descriptors that fall within the given spans.
// 3. Get the database, table and index name for all indexes found in the descriptors.
// 4. Return a mapping of the indexes which appear in each range.
func GetRangeIndexMapping(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary Entrypoint for callers: A function in which a set of ranges can be
passed in and a mapping from those ranges to indexes can be
returned.

// into an upper bound descriptor ID. It does this by turning the key into a
// descriptor ID, and then moving it upwards if and only if it is not the prefix
// of the current index / table span.
func getDescriptorIDFromExclusiveKey(codec keys.SQLCodec, key roachpb.Key) (uint32, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@@ -203,6 +205,24 @@ func TestDataDriven(t *testing.T) {
return cr.GetByNames(ctx, txn, nis)
}
return h.doCatalogQuery(ctx, q)
case "scan_descriptors_in_span":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@@ -217,6 +237,58 @@ type testHelper struct {
ucr, ccr catkv.CatalogReader
}

func (h testHelper) parseSpanFromArgKey(argkey string) roachpb.Span {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@@ -16,6 +16,18 @@ COMMENT ON TABLE kv IS 'this is a table';
COMMENT ON INDEX mv@idx IS 'this is an index';
COMMENT ON CONSTRAINT ck ON kv IS 'this is a check constraint';
COMMENT ON CONSTRAINT kv_pkey ON kv IS 'this is a primary key constraint';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@@ -16,6 +16,18 @@ COMMENT ON TABLE kv IS 'this is a table';
COMMENT ON INDEX mv@idx IS 'this is an index';
COMMENT ON CONSTRAINT ck ON kv IS 'this is a check constraint';
COMMENT ON CONSTRAINT kv_pkey ON kv IS 'this is a primary key constraint';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@@ -743,6 +743,13 @@ func (tc *Collection) GetAll(ctx context.Context, txn *kv.Txn) (nstree.Catalog,
return ret.Catalog, nil
}

// GetDescriptorsInSpans returns all descriptors within a given span.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a part of this PR

@angles-n-daemons angles-n-daemons force-pushed the find-indexes-in-ranges branch 5 times, most recently from 0b83d7d to 109219d Compare October 30, 2024 15:33
@angles-n-daemons angles-n-daemons requested review from a team and kyle-a-wong and removed request for a team October 30, 2024 15:45
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, 9 of 9 files at r3, 10 of 11 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


-- commits line 134 at r4:
nit: replace tabs with spaces.


-- commits line 145 at r4:
💯 thank you for this description.


-- commits line 147 at r4:
nit: "caveat"

nit: what is "it" here?


-- commits line 154 at r4:
doesn't need to be part of the PR, but should we return some list of "unexplained spans" when someone asks about a range?


pkg/roachpb/data.go line 224 at r4 (raw file):

func (k Key) Less(b Key) bool {
	return bytes.Compare(k, b) == -1
}

You have a compare on keys here:

// Compare compares the two Keys.
func (k Key) Compare(b Key) int {
return bytes.Compare(k, b)
}


pkg/roachpb/metadata.go line 1010 at r4 (raw file):

}

type RangeDescriptorsByStartKey []RangeDescriptor

I don't see where this is used.


pkg/roachpb/metadata.go line 1022 at r4 (raw file):

	tmp := r[i]
	r[i] = r[j]
	r[j] = tmp

you can swap on one line via r[i], r[j] = r[j], r[i]


pkg/server/apiutil/rangeutil.go line 20 at r4 (raw file):

)

type IndexNames struct {

nit: docstring. I'm mostly curious why there's no construct already that combines the db/table/index names together.


pkg/server/apiutil/rangeutil.go line 44 at r4 (raw file):

	txn descs.Txn,
	codec keys.SQLCodec,
	databases map[descpb.ID]catalog.DatabaseDescriptor,

it's expected that the caller can produce this map?


pkg/server/apiutil/rangeutil.go line 54 at r4 (raw file):

	}

	indexes, err := TableDescriptorsToIndexNames(codec, databases, tables)

This part's a bit confusing to me. If a span maps to a table descriptor then we'll grab all the indexes from that table? We can't definitively state whether one or the other is in the span?

AHH I'm keeping this comment because I think I understand now: we produce the Span field on every index and then intersect it with the spans from the ranges. Is that right?


pkg/server/apiutil/rangeutil.go line 72 at r4 (raw file):

	flushToResults := func(rangeID roachpb.RangeID) {
		results[rangeID] = contents
		contents = []IndexNames{}

There's a new clear builtin you can use to do this without allocating a new slice: https://pkg.go.dev/builtin#clear


pkg/server/apiutil/rangeutil.go line 124 at r4 (raw file):

// spans to a set of descriptors which describe the table space in which those spans lie.
func SpansToOrderedTableDescriptors(
	ctx context.Context, txn descs.Txn, codec keys.SQLCodec, spans []roachpb.Span,

It appears that codec is unused in this function.


pkg/server/apiutil/rangeutil.go line 151 at r4 (raw file):

	tables []catalog.TableDescriptor,
) ([]IndexNames, error) {
	seen := map[string]bool{}

you can use struct{} as the value type in the map to avoid allocating booleans, since you never read them.


pkg/server/apiutil/rangeutil.go line 178 at r4 (raw file):

func spanFromRange(rangeDesc roachpb.RangeDescriptor) roachpb.Span {
	return rangeDesc.KeySpan().AsRawSpanWithNoLocals()

I think one-liner functions like this make the code harder to read. I'd just inline this.


pkg/sql/catalog/internal/catkv/catalog_reader.go line 168 at r2 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

Not a part of this PR

No need for per-file comments like this. You can just have a note in the PR description if you want the reviewer to skip a commit because it's part of a stack that's being reviewed separately. In Reviewable, I can easily filter the diffs by commit.


pkg/roachpb/key_test.go line 31 at r4 (raw file):

		{"low tenant codec gets clamped to lower bound", lowTp(5), tp(1), tp(10), tp(1)},
		{"high tenant codec gets clamped to upper bound", highTp(5), tp(1), tp(10), tp(10)},
		{"system codec im not sure", sysTp(5), tp(1), tp(10), tp(1)},

I think system tenant should be unchanged if the clamp a/b are also in system tenant, otherwise clamp to the start key.


pkg/roachpb/span_test.go line 4 at r4 (raw file):

//
// Use of this software is governed by the CockroachDB Software License
// included in the /LICENSE file.

Any reason why you're not using the keys_test.go file? I think if the code is in X the tests should be in X_test.


pkg/server/apiutil/rangeutil_test.go line 76 at r4 (raw file):

			}
		}
	}

there might be some utils in the require package that can make these checks easier.x

@angles-n-daemons angles-n-daemons force-pushed the find-indexes-in-ranges branch 4 times, most recently from 694b4dc to 0ee1828 Compare November 5, 2024 14:49
Copy link
Contributor Author

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)


-- commits line 134 at r4:

Previously, dhartunian (David Hartunian) wrote…

nit: replace tabs with spaces.

done.


-- commits line 145 at r4:

Previously, dhartunian (David Hartunian) wrote…

💯 thank you for this description.

np!


-- commits line 147 at r4:

Previously, dhartunian (David Hartunian) wrote…

nit: "caveat"

nit: what is "it" here?

👀 "caveat"

"It" is "this approach", is there a clearer way to describe that this approach has drawbacks?


-- commits line 154 at r4:

Previously, dhartunian (David Hartunian) wrote…

doesn't need to be part of the PR, but should we return some list of "unexplained spans" when someone asks about a range?

We could, I'll create a ticket if it sounds like something we want to support.


pkg/roachpb/data.go line 224 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

You have a compare on keys here:

// Compare compares the two Keys.
func (k Key) Compare(b Key) int {
return bytes.Compare(k, b)
}

Yeah, not sure how this got here, I was intending on using that function. Updated to use the key func.


pkg/server/apiutil/rangeutil.go line 20 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: docstring. I'm mostly curious why there's no construct already that combines the db/table/index names together.

I'm not sure if there is or not - it didn't seem like in the protobuf there was. I asked in the #ask-an-engineer to see if anyone had a concept for this construct.

https://cockroachlabs.slack.com/archives/CHVV403F0/p1729004210052159


pkg/server/apiutil/rangeutil.go line 44 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

it's expected that the caller can produce this map?

Kind of, the the last PR created a GetAllDatabaseDescriptorsMap function on the collection - it's assumed that the caller will do the following:

Create that object so it's only read once.
Order the range descriptors once.
Process them in ordered batches by passing the above object in.


pkg/server/apiutil/rangeutil.go line 54 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

This part's a bit confusing to me. If a span maps to a table descriptor then we'll grab all the indexes from that table? We can't definitively state whether one or the other is in the span?

AHH I'm keeping this comment because I think I understand now: we produce the Span field on every index and then intersect it with the spans from the ranges. Is that right?

That's exactly correct, since the tables slice is a set of tables, of whom have a table space inclusive of all indexes, we apply this translation function to get an ordered list of indexes (which is at a finer span granularity than the input tables slice) and compute their spans as part of that translation.

The latter part of this, does the actual mapping from range to index.


pkg/server/apiutil/rangeutil.go line 72 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

There's a new clear builtin you can use to do this without allocating a new slice: https://pkg.go.dev/builtin#clear

oh nice, changed.

actually, I can't do this. clear somehow garbage collects the contents, even if they're still referenced by other objects in the application.


pkg/server/apiutil/rangeutil.go line 124 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

It appears that codec is unused in this function.

Huh, good catch. Removed.


pkg/server/apiutil/rangeutil.go line 151 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

you can use struct{} as the value type in the map to avoid allocating booleans, since you never read them.

changed.


pkg/server/apiutil/rangeutil.go line 178 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I think one-liner functions like this make the code harder to read. I'd just inline this.

agreed, this is more of a pass through than anything, removed.


pkg/roachpb/key_test.go line 31 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I think system tenant should be unchanged if the clamp a/b are also in system tenant, otherwise clamp to the start key.

This test is mostly to check how the system codec behaves wrt a tenant codec. I meant to update the test string, I'll change that now.


pkg/roachpb/span_test.go line 4 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Any reason why you're not using the keys_test.go file? I think if the code is in X the tests should be in X_test.

X is data in this case. I felt like data_test.go was a bit of a dumping ground for this package, which is why this PR adds keys_testand span_test


pkg/server/apiutil/rangeutil_test.go line 76 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

there might be some utils in the require package that can make these checks easier.x

👀 added some, for checks on the IndexNames like this one however - this approach is still favorable:

stretchr/testify#1616


pkg/roachpb/metadata.go line 1010 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I don't see where this is used.

This was originally used in the next PR, but has been scrapped in favor of this comparator function:

https://github.com/cockroachdb/cockroach/pull/134106/files#diff-d9b23e79aff8a828ba7694102c0a012fb7313fe99ffb4185455d7741871fcb8dR22


pkg/roachpb/metadata.go line 1022 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

you can swap on one line via r[i], r[j] = r[j], r[i]

removing this.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just one request for yet another docstring.

Reviewed 1 of 2 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


-- commits line 147 at r4:

Previously, angles-n-daemons (Brian Dillmann) wrote…

👀 "caveat"

"It" is "this approach", is there a clearer way to describe that this approach has drawbacks?

Ah I get it, all good. I must have spaced out between the sentences.


-- commits line 154 at r4:

Previously, angles-n-daemons (Brian Dillmann) wrote…

We could, I'll create a ticket if it sounds like something we want to support.

Nah, we can leave it for now.


pkg/server/apiutil/rangeutil.go line 20 at r4 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

I'm not sure if there is or not - it didn't seem like in the protobuf there was. I asked in the #ask-an-engineer to see if anyone had a concept for this construct.

https://cockroachlabs.slack.com/archives/CHVV403F0/p1729004210052159

I like the comment you added so def keep it, but also just add a docstring to the exported struct to explain its purpose.


pkg/server/apiutil/rangeutil.go line 44 at r4 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

Kind of, the the last PR created a GetAllDatabaseDescriptorsMap function on the collection - it's assumed that the caller will do the following:

Create that object so it's only read once.
Order the range descriptors once.
Process them in ordered batches by passing the above object in.

👍


pkg/server/apiutil/rangeutil.go line 72 at r4 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

oh nice, changed.

actually, I can't do this. clear somehow garbage collects the contents, even if they're still referenced by other objects in the application.

I don't think clear will garbage collect. I don't think you can trigger GC at will in golang. Anyway, fine to leave as-is. We can investigate/discuss separately.


pkg/roachpb/span_test.go line 4 at r4 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

X is data in this case. I felt like data_test.go was a bit of a dumping ground for this package, which is why this PR adds keys_testand span_test

ah! makes sense. thanks.

Copy link
Contributor Author

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, forgot about that! adding it now

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)

Previously each range correlated to a single table, or even a single
index in a database, so all that was required to identify which tables,
indexes were in the range were to look at the start key of the range and
map it accordingly.

With range coalescing however, it's possible for one, or many, tables,
indexes and the like to reside within the same range. To properly
identify the contents of a range, this PR adds the following utilities:

 1. A utility function which turns a range into a span, and clamps it
    to its tenant's table space.
 2. A utility function which takes the above spans and uses the catalog
    and new descriptor by span utility to turn those spans into a set of
    table descriptors ordered by id.
 3. A utility function which transforms those table descriptors into a
    set of (database, table, index) names which deduplicate and identify
    each index uniquely.
 4. A utility function, which merges the ranges and indexes into a map
    keyed by RangeID whose values are the above index names.
 5. A primary entrypoint for consumers from which a set of ranges can be
    passed in and a mapping from those ranges to indexes can be
    returned.

A variety of caveats come with this approach. It attempts to scan the
desciptors all at once, but it still will scan a sizable portion of the
descriptors table if the request is large enough. This makes no attempt
to describe system information which does not have a descriptor. It will
describe system tables which appear in the descriptors table, but it
will not try to explain "tables" which do not have descriptors (example
tsdb), or any other information stored in the keyspace without a
descriptor (PseudoTableIDs, GossipKeys for example).

Throughout this work, many existing utilities were duplicated, and then
un-duplicated (`keys.TableDataMin`, `roachpb.Span.Overlap`, etc). If you
see anything that seems to already exist, feel free to point it out
accordingly.

Epic: CRDB-43151
Fixes: cockroachdb#130997

Release note: None
@angles-n-daemons
Copy link
Contributor Author

bors r+

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @angles-n-daemons, @dhartunian, and @kyle-a-wong)


pkg/sql/catalog/descs/collection.go line 1152 at r8 (raw file):

}

// but as a map with the database ID as the key.

nit: what happened to the comment here?

@craig craig bot merged commit c62b41d into cockroachdb:master Nov 11, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

List multiple table names in the Hot Ranges table for coalesced ranges containing different tables
4 participants