Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

descs: unify uncommitted and kv descriptors #84442

Merged
merged 2 commits into from
Jul 20, 2022

Conversation

jasonmchan
Copy link
Contributor

@jasonmchan jasonmchan commented Jul 14, 2022

Relates to #64673.

Previously, the collection maintained two separate sources:
uncommittedDescriptors and kvDescriptors. This was confusing because
uncommittedDescriptors was intended to contain modified descriptors,
but it also held descriptors cached from KV point lookups. This commit
fixes this by introducing storedDescriptors, which is the combination
of the above sources.

This commit works towards a model where storedDescriptors is a mirror
of the descriptors in KV. These descriptors were either cached after KV
reads, or were modified by the associated transaction and should be
written to KV upon commit. Now, all descriptors read from storage are
stored in the same btree, eliminating possible duplication between
kvDescriptors and uncommittedDescriptors.

As a byproduct of this change, we are able to better leverage caching
due to virtual table lookups within a transaction. First, we no longer
need to invalidate batches of cached descriptors after schema changes.
Second, point lookups after a range lookup will properly check the
descriptors cached due to the range lookup. New rttanalysis cases
demonstrate round-trip reductions.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jasonmchan jasonmchan force-pushed the stored-descs branch 3 times, most recently from e523fd4 to 0d0cd32 Compare July 15, 2022 17:23
@jasonmchan jasonmchan marked this pull request as ready for review July 15, 2022 17:50
@jasonmchan jasonmchan requested a review from a team July 15, 2022 17:50
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely done! This is outstanding, and very welcome. I only have minor comments.

Reviewed 18 of 18 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @jasonmchan)


pkg/sql/catalog/descs/collection.go line 208 at r1 (raw file):

// copy is performed in this call.
func (tc *Collection) AddUncommittedDescriptor(desc catalog.MutableDescriptor) error {
	return tc.stored.checkIn(desc)

For better or for worse, checkIn assumes that the descriptor already exists. You're going to have to modify this a little bit, if only because new descriptors won't have their memory accounted for in tc.stored.


pkg/sql/catalog/descs/descriptor.go line 242 at r1 (raw file):

		}
	}
	log.VEventf(q.ctx, 2, "found stored descriptor %d", id)

s/found stored/found cached/?


pkg/sql/catalog/descs/descriptor.go line 339 at r1 (raw file):

		ud := tc.stored.getCachedByName(parentID, parentSchemaID, name)
		if ud != nil {
			log.VEventf(ctx, 2, "found stored descriptor %d", ud.GetID())

s/found stored/found cached/?


pkg/sql/catalog/descs/hydrate.go line 137 at r1 (raw file):

		// descriptor up to this layer.
		if tc.hydratedTables != nil &&
			tc.stored.descs.GetByID(desc.GetID()) == nil &&

Shouldn't this be a cached lookup with getCachedByID here? Or is this deliberate?


pkg/sql/catalog/descs/stored_descriptors.go line 1 at r1 (raw file):

// Copyright 2021 The Cockroach Authors.

s/2021/2022/


pkg/sql/catalog/descs/stored_descriptors.go line 170 at r1 (raw file):

	// memAcc is the actual account of an injected, upstream monitor
	// to track memory usage of unDescriptors.

typo


pkg/sql/catalog/descs/stored_descriptors.go line 364 at r1 (raw file):

		}
		// Monitor memory usage of the catalog recently read from storage.
		// TODO(jchan): Add memory monitoring for other storage reads.

You might as well do so in this commit, it should be quite straightforward: in add, before you Upsert, check if an entry already exists, if not, Grow. If it isn't, please file an issue and tag it to our board, where you explain why, so that we can remember to do this. TODO comments aren't enforced in any way, they're just ways to signal "oh we should probably do this at some point perhaps", versus "we definitely have a follow-up task", if that makes sense.


pkg/sql/catalog/descs/stored_descriptors.go line 392 at r1 (raw file):

	// TODO(jchan): There's a lot of wasted effort here in reconstructing a
	// catalog and hydrating every descriptor. We could do better by caching the
	// catalog and invalidating it upon descriptor check-out.

Following up on my earlier comment about TODOs: this is a very good use of a TODO comment.


pkg/sql/catalog/descs/stored_descriptors.go line 603 at r1 (raw file):

		// We can then consider making the descriptor collection aware of
		// uncommitted namespace operations.
		return nil, nil

I must have missed it when doing #84446 recently. Can you remove this if block please? It should be safe to do so.


pkg/bench/rttanalysis/testdata/benchmark_expectations line 96 at r1 (raw file):

1,VirtualTableQueries/select_crdb_internal.tables_with_1_fk
4,VirtualTableQueries/virtual_table_cache_with_point_lookups
12,VirtualTableQueries/virtual_table_cache_with_schema_change

Is it possible to have an earlier commit containing these new entries prior to the changes which unify kv and uncommitted into stored?


pkg/bench/rttanalysis/virtual_table_bench_test.go line 64 at r1 (raw file):

SELECT * FROM t1;
SELECT * FROM t2;`,
		},

This is out of scope for this PR and is intended more as a general comment, perhaps more for @ajwerner 's attention: what kind of other test cases would we want here? Presumably stuff that's relevant to schema migrations, so pg_catalog queries perhaps?


pkg/sql/catalog/descs/collection_test.go line 785 at r1 (raw file):

		}))
	})
}

Nice tests!

@jasonmchan jasonmchan requested a review from a team July 18, 2022 20:45
@jasonmchan jasonmchan requested a review from a team as a code owner July 18, 2022 20:45
@jasonmchan jasonmchan requested a review from rhu713 July 18, 2022 20:45
@jasonmchan jasonmchan force-pushed the stored-descs branch 2 times, most recently from 7a35acd to e62b9a9 Compare July 18, 2022 20:56
Copy link
Contributor Author

@jasonmchan jasonmchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)


pkg/sql/catalog/descs/collection.go line 208 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

For better or for worse, checkIn assumes that the descriptor already exists. You're going to have to modify this a little bit, if only because new descriptors won't have their memory accounted for in tc.stored.

Agreed, I think add makes more sense than checkIn here. I rewrote the checkIn method to explicitly check that the descriptor already exists.


pkg/sql/catalog/descs/descriptor.go line 242 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

s/found stored/found cached/?

Fixed.


pkg/sql/catalog/descs/descriptor.go line 339 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

s/found stored/found cached/?

Fixed.


pkg/sql/catalog/descs/hydrate.go line 137 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't this be a cached lookup with getCachedByID here? Or is this deliberate?

This is a cached lookup since it's the GetByID method on the nstree.Map. On this note, maybe it would be clearer if I renamed the storedDescriptors methods from getByIDs/getByName to getByIDsFromKV/getByNameFromKV.


pkg/sql/catalog/descs/stored_descriptors.go line 1 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

s/2021/2022/

Fixed.


pkg/sql/catalog/descs/stored_descriptors.go line 170 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

typo

Fixed.


pkg/sql/catalog/descs/stored_descriptors.go line 364 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

You might as well do so in this commit, it should be quite straightforward: in add, before you Upsert, check if an entry already exists, if not, Grow. If it isn't, please file an issue and tag it to our board, where you explain why, so that we can remember to do this. TODO comments aren't enforced in any way, they're just ways to signal "oh we should probably do this at some point perhaps", versus "we definitely have a follow-up task", if that makes sense.

Done--your point about TODOs makes sense.


pkg/sql/catalog/descs/stored_descriptors.go line 603 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I must have missed it when doing #84446 recently. Can you remove this if block please? It should be safe to do so.

Done. I had to get rid of a test for this--see below.


pkg/bench/rttanalysis/testdata/benchmark_expectations line 96 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Is it possible to have an earlier commit containing these new entries prior to the changes which unify kv and uncommitted into stored?

Done.


pkg/bench/rttanalysis/virtual_table_bench_test.go line 64 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

This is out of scope for this PR and is intended more as a general comment, perhaps more for @ajwerner 's attention: what kind of other test cases would we want here? Presumably stuff that's relevant to schema migrations, so pg_catalog queries perhaps?

I would also be interested in seeing schema migration test cases. I haven't seen real-world usages of virtual tables in schema change transactions, so I'm not sure how realistic the test cases I put up are.


pkg/sql/catalog/descs/collection_test.go line 262 at r1 (raw file):

			require.NoError(t, err)

			// Try to get the database descriptor by the new name and fail.

To confirm: can we always expect the namespace to be updated atomically with descriptor name changes? Is it safe to remove this test (it failed due to the removing the above if statement)?

Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 19 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)


pkg/sql/catalog/descs/hydrate.go line 137 at r1 (raw file):

Previously, jasonmchan (Jason Chan) wrote…

This is a cached lookup since it's the GetByID method on the nstree.Map. On this note, maybe it would be clearer if I renamed the storedDescriptors methods from getByIDs/getByName to getByIDsFromKV/getByNameFromKV.

Oh, my bad, I misread tc.stored.GetByID instead of tc.stored.descs.GetByID. never mind. Please don't bother renaming the methods unless you feel strongly about this. If anything needs renaming it's perhaps descs to inMemoryDescs or something like that.


pkg/sql/catalog/descs/stored_descriptors.go line 603 at r1 (raw file):

Previously, jasonmchan (Jason Chan) wrote…

Done. I had to get rid of a test for this--see below.

It turns out that I was wrong, see below. Sorry for misleading you.


pkg/bench/rttanalysis/testdata/benchmark_expectations line 96 at r1 (raw file):

Previously, jasonmchan (Jason Chan) wrote…

Done.

Thanks! It's always nice to see these numbers go down.


pkg/bench/rttanalysis/virtual_table_bench_test.go line 64 at r1 (raw file):

Previously, jasonmchan (Jason Chan) wrote…

I would also be interested in seeing schema migration test cases. I haven't seen real-world usages of virtual tables in schema change transactions, so I'm not sure how realistic the test cases I put up are.

I'm not sure, but I'm confident that the tests which you put up are fine and provide enough coverage for the the purposes of this PR.


pkg/sql/catalog/descs/collection_test.go line 262 at r1 (raw file):

Previously, jasonmchan (Jason Chan) wrote…

To confirm: can we always expect the namespace to be updated atomically with descriptor name changes? Is it safe to remove this test (it failed due to the removing the above if statement)?

That's a great question. It turns out that we can't.

First of all even though the answer to your question might effectively be "yes" there's no mechanism in place to enforce this atomicity. Furthermore, what's missing is making the Collection aware of system.namespace entries the same way it's aware of system.descriptor: we don't have the machinery yet, we just use the names in the descriptors. I think the guarantee we have there is that the name in the descriptor is also present in namespace but the reverse isn't true (because we used to have old names linger in there for a while! that's what's meant by "draining names").

Really, the Collection should provide first-class treatment to all catalog system tables, not just descriptor but also namespace, zones, comments, even perhaps users or roles, who knows. That's for a more distant future, however.

tl;dr: sorry put the if-block I told you to remove back plz

Copy link
Contributor Author

@jasonmchan jasonmchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)


pkg/sql/catalog/descs/collection_test.go line 262 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

That's a great question. It turns out that we can't.

First of all even though the answer to your question might effectively be "yes" there's no mechanism in place to enforce this atomicity. Furthermore, what's missing is making the Collection aware of system.namespace entries the same way it's aware of system.descriptor: we don't have the machinery yet, we just use the names in the descriptors. I think the guarantee we have there is that the name in the descriptor is also present in namespace but the reverse isn't true (because we used to have old names linger in there for a while! that's what's meant by "draining names").

Really, the Collection should provide first-class treatment to all catalog system tables, not just descriptor but also namespace, zones, comments, even perhaps users or roles, who knows. That's for a more distant future, however.

tl;dr: sorry put the if-block I told you to remove back plz

Thanks for the explanation! Reverted this change.

Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 18 files at r1, 14 of 19 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)


pkg/bench/rttanalysis/rtt_analysis_bench.go line 127 at r5 (raw file):

	if err != nil {
		panic(err)
	}

require.NoError(b, err)?

Code quote:

	if err != nil {
		panic(err)
	}

pkg/sql/catalog/descs/stored_descriptors.go line 397 at r4 (raw file):

	// TODO(jchan): There's a lot of wasted effort here in reconstructing a
	// catalog and hydrating every descriptor. We could do better by caching the
	// catalog and invalidating it upon descriptor check-out.

Can you file an issue to track this TODO while you have context?

Code quote:

	// TODO(jchan): There's a lot of wasted effort here in reconstructing a
	// catalog and hydrating every descriptor. We could do better by caching the
	// catalog and invalidating it upon descriptor check-out.
	cat := nstree.MutableCatalog{}

pkg/sql/catalog/descs/stored_descriptors.go line 413 at r4 (raw file):

	}
	// There could be tables with user defined types that need hydrating.
	if err := HydrateGivenDescriptors(ctx, cat.OrderedDescriptors()); err != nil {

The way HydrateGivenDescriptors works seems silly given we have the nstree.Catalog. It predates that object. As a follow-up task, any interest in making a version that takes an nstree.Catalog and just hydrates in place. I think it'll be pretty trivial.

@ajwerner
Copy link
Contributor

I think you should also rebase master to pick up some flakey test fixes.

Jason Chan added 2 commits July 19, 2022 17:16
This commit adds just tests without the caching changes, so we will see
round-trip reductions in a subsequent commit. For test cases that
execute multiple statements, the rttanalysis framework now aggregates
the round-trip count of each statement.

Release note: None
Relates to cockroachdb#64673.

Previously, the collection maintained two separate sources:
`uncommittedDescriptors` and `kvDescriptors`. This was confusing because
`uncommittedDescriptors` was intended to contain modified descriptors,
but it also held descriptors cached from KV point lookups. This commit
fixes this by introducing `storedDescriptors`, which is the combination
of the above sources.

This commit works towards a model where `storedDescriptors` is a mirror
of the descriptors in KV. These descriptors were either cached after KV
reads, or were modified by the associated transaction and should be
written to KV upon commit. Now, all descriptors read from storage are
stored in the same btree, eliminating possible duplication between
`kvDescriptors` and `uncommittedDescriptors`.

As a byproduct of this change, we are able to better leverage caching
due to virtual table lookups within a transaction. First, we no longer
need to invalidate batches of cached descriptors after schema changes.
Second, point lookups after a range lookup will properly check the
descriptors cached due to the range lookup.

Release note: None
Copy link
Contributor Author

@jasonmchan jasonmchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @chengxiong-ruan, @jasonmchan, @postamar, and @rhu713)


pkg/bench/rttanalysis/rtt_analysis_bench.go line 127 at r5 (raw file):

Previously, ajwerner wrote…

require.NoError(b, err)?

Fixed.


pkg/sql/catalog/descs/stored_descriptors.go line 397 at r4 (raw file):

Previously, ajwerner wrote…

Can you file an issue to track this TODO while you have context?

Done.


pkg/sql/catalog/descs/stored_descriptors.go line 413 at r4 (raw file):

Previously, ajwerner wrote…

The way HydrateGivenDescriptors works seems silly given we have the nstree.Catalog. It predates that object. As a follow-up task, any interest in making a version that takes an nstree.Catalog and just hydrates in place. I think it'll be pretty trivial.

Yes this seems fairly straightforward, I'll take a stab at it.

@jasonmchan
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 20, 2022

Build failed (retrying...):

@craig craig bot merged commit 200cfaf into cockroachdb:master Jul 20, 2022
@craig
Copy link
Contributor

craig bot commented Jul 20, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants