-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
descs: unify uncommitted and kv descriptors #84442
Conversation
e523fd4
to
0d0cd32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nicely done! This is outstanding, and very welcome. I only have minor comments.
Reviewed 18 of 18 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @jasonmchan)
pkg/sql/catalog/descs/collection.go
line 208 at r1 (raw file):
// copy is performed in this call. func (tc *Collection) AddUncommittedDescriptor(desc catalog.MutableDescriptor) error { return tc.stored.checkIn(desc)
For better or for worse, checkIn
assumes that the descriptor already exists. You're going to have to modify this a little bit, if only because new descriptors won't have their memory accounted for in tc.stored
.
pkg/sql/catalog/descs/descriptor.go
line 242 at r1 (raw file):
} } log.VEventf(q.ctx, 2, "found stored descriptor %d", id)
s/found stored/found cached/
?
pkg/sql/catalog/descs/descriptor.go
line 339 at r1 (raw file):
ud := tc.stored.getCachedByName(parentID, parentSchemaID, name) if ud != nil { log.VEventf(ctx, 2, "found stored descriptor %d", ud.GetID())
s/found stored/found cached/
?
pkg/sql/catalog/descs/hydrate.go
line 137 at r1 (raw file):
// descriptor up to this layer. if tc.hydratedTables != nil && tc.stored.descs.GetByID(desc.GetID()) == nil &&
Shouldn't this be a cached lookup with getCachedByID
here? Or is this deliberate?
pkg/sql/catalog/descs/stored_descriptors.go
line 1 at r1 (raw file):
// Copyright 2021 The Cockroach Authors.
s/2021/2022/
pkg/sql/catalog/descs/stored_descriptors.go
line 170 at r1 (raw file):
// memAcc is the actual account of an injected, upstream monitor // to track memory usage of unDescriptors.
typo
pkg/sql/catalog/descs/stored_descriptors.go
line 364 at r1 (raw file):
} // Monitor memory usage of the catalog recently read from storage. // TODO(jchan): Add memory monitoring for other storage reads.
You might as well do so in this commit, it should be quite straightforward: in add
, before you Upsert
, check if an entry already exists, if not, Grow
. If it isn't, please file an issue and tag it to our board, where you explain why, so that we can remember to do this. TODO comments aren't enforced in any way, they're just ways to signal "oh we should probably do this at some point perhaps", versus "we definitely have a follow-up task", if that makes sense.
pkg/sql/catalog/descs/stored_descriptors.go
line 392 at r1 (raw file):
// TODO(jchan): There's a lot of wasted effort here in reconstructing a // catalog and hydrating every descriptor. We could do better by caching the // catalog and invalidating it upon descriptor check-out.
Following up on my earlier comment about TODOs: this is a very good use of a TODO comment.
pkg/sql/catalog/descs/stored_descriptors.go
line 603 at r1 (raw file):
// We can then consider making the descriptor collection aware of // uncommitted namespace operations. return nil, nil
I must have missed it when doing #84446 recently. Can you remove this if
block please? It should be safe to do so.
pkg/bench/rttanalysis/testdata/benchmark_expectations
line 96 at r1 (raw file):
1,VirtualTableQueries/select_crdb_internal.tables_with_1_fk 4,VirtualTableQueries/virtual_table_cache_with_point_lookups 12,VirtualTableQueries/virtual_table_cache_with_schema_change
Is it possible to have an earlier commit containing these new entries prior to the changes which unify kv and uncommitted into stored?
pkg/bench/rttanalysis/virtual_table_bench_test.go
line 64 at r1 (raw file):
SELECT * FROM t1; SELECT * FROM t2;`, },
This is out of scope for this PR and is intended more as a general comment, perhaps more for @ajwerner 's attention: what kind of other test cases would we want here? Presumably stuff that's relevant to schema migrations, so pg_catalog queries perhaps?
pkg/sql/catalog/descs/collection_test.go
line 785 at r1 (raw file):
})) }) }
Nice tests!
7a35acd
to
e62b9a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
pkg/sql/catalog/descs/collection.go
line 208 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
For better or for worse,
checkIn
assumes that the descriptor already exists. You're going to have to modify this a little bit, if only because new descriptors won't have their memory accounted for intc.stored
.
Agreed, I think add
makes more sense than checkIn
here. I rewrote the checkIn
method to explicitly check that the descriptor already exists.
pkg/sql/catalog/descs/descriptor.go
line 242 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
s/found stored/found cached/
?
Fixed.
pkg/sql/catalog/descs/descriptor.go
line 339 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
s/found stored/found cached/
?
Fixed.
pkg/sql/catalog/descs/hydrate.go
line 137 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Shouldn't this be a cached lookup with
getCachedByID
here? Or is this deliberate?
This is a cached lookup since it's the GetByID
method on the nstree.Map
. On this note, maybe it would be clearer if I renamed the storedDescriptors
methods from getByIDs
/getByName
to getByIDsFromKV
/getByNameFromKV
.
pkg/sql/catalog/descs/stored_descriptors.go
line 1 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
s/2021/2022/
Fixed.
pkg/sql/catalog/descs/stored_descriptors.go
line 170 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
typo
Fixed.
pkg/sql/catalog/descs/stored_descriptors.go
line 364 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
You might as well do so in this commit, it should be quite straightforward: in
add
, before youUpsert
, check if an entry already exists, if not,Grow
. If it isn't, please file an issue and tag it to our board, where you explain why, so that we can remember to do this. TODO comments aren't enforced in any way, they're just ways to signal "oh we should probably do this at some point perhaps", versus "we definitely have a follow-up task", if that makes sense.
Done--your point about TODOs makes sense.
pkg/sql/catalog/descs/stored_descriptors.go
line 603 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
I must have missed it when doing #84446 recently. Can you remove this
if
block please? It should be safe to do so.
Done. I had to get rid of a test for this--see below.
pkg/bench/rttanalysis/testdata/benchmark_expectations
line 96 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Is it possible to have an earlier commit containing these new entries prior to the changes which unify kv and uncommitted into stored?
Done.
pkg/bench/rttanalysis/virtual_table_bench_test.go
line 64 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
This is out of scope for this PR and is intended more as a general comment, perhaps more for @ajwerner 's attention: what kind of other test cases would we want here? Presumably stuff that's relevant to schema migrations, so pg_catalog queries perhaps?
I would also be interested in seeing schema migration test cases. I haven't seen real-world usages of virtual tables in schema change transactions, so I'm not sure how realistic the test cases I put up are.
pkg/sql/catalog/descs/collection_test.go
line 262 at r1 (raw file):
require.NoError(t, err) // Try to get the database descriptor by the new name and fail.
To confirm: can we always expect the namespace to be updated atomically with descriptor name changes? Is it safe to remove this test (it failed due to the removing the above if statement)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
pkg/sql/catalog/descs/hydrate.go
line 137 at r1 (raw file):
Previously, jasonmchan (Jason Chan) wrote…
This is a cached lookup since it's the
GetByID
method on thenstree.Map
. On this note, maybe it would be clearer if I renamed thestoredDescriptors
methods fromgetByIDs
/getByName
togetByIDsFromKV
/getByNameFromKV
.
Oh, my bad, I misread tc.stored.GetByID
instead of tc.stored.descs.GetByID
. never mind. Please don't bother renaming the methods unless you feel strongly about this. If anything needs renaming it's perhaps descs
to inMemoryDescs
or something like that.
pkg/sql/catalog/descs/stored_descriptors.go
line 603 at r1 (raw file):
Previously, jasonmchan (Jason Chan) wrote…
Done. I had to get rid of a test for this--see below.
It turns out that I was wrong, see below. Sorry for misleading you.
pkg/bench/rttanalysis/testdata/benchmark_expectations
line 96 at r1 (raw file):
Previously, jasonmchan (Jason Chan) wrote…
Done.
Thanks! It's always nice to see these numbers go down.
pkg/bench/rttanalysis/virtual_table_bench_test.go
line 64 at r1 (raw file):
Previously, jasonmchan (Jason Chan) wrote…
I would also be interested in seeing schema migration test cases. I haven't seen real-world usages of virtual tables in schema change transactions, so I'm not sure how realistic the test cases I put up are.
I'm not sure, but I'm confident that the tests which you put up are fine and provide enough coverage for the the purposes of this PR.
pkg/sql/catalog/descs/collection_test.go
line 262 at r1 (raw file):
Previously, jasonmchan (Jason Chan) wrote…
To confirm: can we always expect the namespace to be updated atomically with descriptor name changes? Is it safe to remove this test (it failed due to the removing the above if statement)?
That's a great question. It turns out that we can't.
First of all even though the answer to your question might effectively be "yes" there's no mechanism in place to enforce this atomicity. Furthermore, what's missing is making the Collection
aware of system.namespace
entries the same way it's aware of system.descriptor
: we don't have the machinery yet, we just use the names in the descriptors. I think the guarantee we have there is that the name in the descriptor is also present in namespace
but the reverse isn't true (because we used to have old names linger in there for a while! that's what's meant by "draining names").
Really, the Collection
should provide first-class treatment to all catalog system tables, not just descriptor
but also namespace
, zones
, comments
, even perhaps users
or roles
, who knows. That's for a more distant future, however.
tl;dr: sorry put the if-block I told you to remove back plz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
pkg/sql/catalog/descs/collection_test.go
line 262 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
That's a great question. It turns out that we can't.
First of all even though the answer to your question might effectively be "yes" there's no mechanism in place to enforce this atomicity. Furthermore, what's missing is making the
Collection
aware ofsystem.namespace
entries the same way it's aware ofsystem.descriptor
: we don't have the machinery yet, we just use the names in the descriptors. I think the guarantee we have there is that the name in the descriptor is also present innamespace
but the reverse isn't true (because we used to have old names linger in there for a while! that's what's meant by "draining names").Really, the
Collection
should provide first-class treatment to all catalog system tables, not justdescriptor
but alsonamespace
,zones
,comments
, even perhapsusers
orroles
, who knows. That's for a more distant future, however.tl;dr: sorry put the if-block I told you to remove back plz
Thanks for the explanation! Reverted this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 18 files at r1, 14 of 19 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
pkg/bench/rttanalysis/rtt_analysis_bench.go
line 127 at r5 (raw file):
if err != nil { panic(err) }
require.NoError(b, err)
?
Code quote:
if err != nil {
panic(err)
}
pkg/sql/catalog/descs/stored_descriptors.go
line 397 at r4 (raw file):
// TODO(jchan): There's a lot of wasted effort here in reconstructing a // catalog and hydrating every descriptor. We could do better by caching the // catalog and invalidating it upon descriptor check-out.
Can you file an issue to track this TODO while you have context?
Code quote:
// TODO(jchan): There's a lot of wasted effort here in reconstructing a
// catalog and hydrating every descriptor. We could do better by caching the
// catalog and invalidating it upon descriptor check-out.
cat := nstree.MutableCatalog{}
pkg/sql/catalog/descs/stored_descriptors.go
line 413 at r4 (raw file):
} // There could be tables with user defined types that need hydrating. if err := HydrateGivenDescriptors(ctx, cat.OrderedDescriptors()); err != nil {
The way HydrateGivenDescriptors
works seems silly given we have the nstree.Catalog
. It predates that object. As a follow-up task, any interest in making a version that takes an nstree.Catalog
and just hydrates in place. I think it'll be pretty trivial.
I think you should also rebase master to pick up some flakey test fixes. |
This commit adds just tests without the caching changes, so we will see round-trip reductions in a subsequent commit. For test cases that execute multiple statements, the rttanalysis framework now aggregates the round-trip count of each statement. Release note: None
Relates to cockroachdb#64673. Previously, the collection maintained two separate sources: `uncommittedDescriptors` and `kvDescriptors`. This was confusing because `uncommittedDescriptors` was intended to contain modified descriptors, but it also held descriptors cached from KV point lookups. This commit fixes this by introducing `storedDescriptors`, which is the combination of the above sources. This commit works towards a model where `storedDescriptors` is a mirror of the descriptors in KV. These descriptors were either cached after KV reads, or were modified by the associated transaction and should be written to KV upon commit. Now, all descriptors read from storage are stored in the same btree, eliminating possible duplication between `kvDescriptors` and `uncommittedDescriptors`. As a byproduct of this change, we are able to better leverage caching due to virtual table lookups within a transaction. First, we no longer need to invalidate batches of cached descriptors after schema changes. Second, point lookups after a range lookup will properly check the descriptors cached due to the range lookup. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)
pkg/bench/rttanalysis/rtt_analysis_bench.go
line 127 at r5 (raw file):
Previously, ajwerner wrote…
require.NoError(b, err)
?
Fixed.
pkg/sql/catalog/descs/stored_descriptors.go
line 397 at r4 (raw file):
Previously, ajwerner wrote…
Can you file an issue to track this TODO while you have context?
Done.
pkg/sql/catalog/descs/stored_descriptors.go
line 413 at r4 (raw file):
Previously, ajwerner wrote…
The way
HydrateGivenDescriptors
works seems silly given we have thenstree.Catalog
. It predates that object. As a follow-up task, any interest in making a version that takes annstree.Catalog
and just hydrates in place. I think it'll be pretty trivial.
Yes this seems fairly straightforward, I'll take a stab at it.
bors r+ |
Build failed (retrying...): |
Build succeeded: |
Relates to #64673.
Previously, the collection maintained two separate sources:
uncommittedDescriptors
andkvDescriptors
. This was confusing becauseuncommittedDescriptors
was intended to contain modified descriptors,but it also held descriptors cached from KV point lookups. This commit
fixes this by introducing
storedDescriptors
, which is the combinationof the above sources.
This commit works towards a model where
storedDescriptors
is a mirrorof the descriptors in KV. These descriptors were either cached after KV
reads, or were modified by the associated transaction and should be
written to KV upon commit. Now, all descriptors read from storage are
stored in the same btree, eliminating possible duplication between
kvDescriptors
anduncommittedDescriptors
.As a byproduct of this change, we are able to better leverage caching
due to virtual table lookups within a transaction. First, we no longer
need to invalidate batches of cached descriptors after schema changes.
Second, point lookups after a range lookup will properly check the
descriptors cached due to the range lookup. New rttanalysis cases
demonstrate round-trip reductions.
Release note: None