Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: notify tenant servers of metadata changes #105441

Merged
merged 5 commits into from
Jul 26, 2023

Conversation

knz
Copy link
Contributor

@knz knz commented Jun 23, 2023

Informs (towards fixing) #98431.
Fixes #105721.
Epic: CRDB-26691

See the individual commits for details.

@knz knz added the A-multitenancy Related to multi-tenancy label Jun 23, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz requested a review from yuzefovich June 23, 2023 16:53
@knz
Copy link
Contributor Author

knz commented Jun 23, 2023

Note to self:

  • complete the connector-side of the change
  • rename all the things

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me. As always, thanks for great commit messages.

Reviewed 4 of 4 files at r1, 12 of 12 files at r2, 9 of 9 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


-- commits line 2 at r1:
There is another log.Fatal call in pkg/multitenant directory - in decoder.go, should that be fixed too?


-- commits line 26 at r3:
nit: s/medata/metadata/.


-- commits line 44 at r3:
nit: I think you meant pkg/server/tenantsettingswatcher/watcher.go.


-- commits line 45 at r3:
This seems like an opportunity to extract a helper to avoid code duplication, and I'm assuming you considered it but decided not to, what's the rationale?


pkg/kv/kvpb/api.proto line 3131 at r3 (raw file):

  // Note that this field is advisory: the server may know of a more
  // recent (and different) set of capabilities, and server-side
  // capabilities check always prevail.

nit: s/prevail/prevails/.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/BUILD.bazel line 4 at r2 (raw file):

go_library(
    name = "tenantcapabilitiestestutils",

nit: we could add testonly = 1, while we're here.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 58 at r2 (raw file):

) (
	name *roachpb.TenantName,
	serviceMode mtinfopb.TenantServiceMode,

nit: should we switch the order of serviceMode and dataState here and in watcher_test.go to match the order of columns in the system table as well as in the Entry struct? Up to you.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go line 92 at r2 (raw file):

	var name roachpb.TenantName
	if len(datums) > 3 {

Why do we need length checks on datums here? My reading of the code says that Decoder.Decode should return a datum for each column in the table, and system.tenants has 6 columns.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go line 102 at r2 (raw file):

		if i := datums[4]; i != tree.DNull {
			rawDataState := tree.MustBeDInt(i)
			if rawDataState >= 0 && rawDataState <= tree.DInt(mtinfopb.MaxDataState) {

It's unexpected when rawDataState is outside of [0, mtinfopb.MaxDataState) range, right? Should we return an error or at least log a warning? The only valid scenario when this can happen I can think of is when a user manually inserts a row with these values into the system table, are there other cases?


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go line 129 at r2 (raw file):

}

// capabilityEntrySize is an estimate for a (tenantID, capability) pair that the

nit: the comment is now stale.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go line 158 at r3 (raw file):

	id roachpb.TenantID,
) (*tenantcapabilitiespb.TenantCapabilities, bool) {
	cp := w.getInternal(id)

This call to getInternal increases the mutex contention in case when the entry is not found (previously we only had a single RLock, but now we'll have RLock followed by RWLock). Is this concerning?

@knz knz force-pushed the 20230623-tenant-watch branch from a5bdedc to 52d9c5c Compare June 27, 2023 13:00
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


-- commits line 2 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

There is another log.Fatal call in pkg/multitenant directory - in decoder.go, should that be fixed too?

Good idea. Done.


-- commits line 26 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/medata/metadata/.

Fixed.


-- commits line 44 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think you meant pkg/server/tenantsettingswatcher/watcher.go.

Indeed. Fixed.


-- commits line 45 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems like an opportunity to extract a helper to avoid code duplication, and I'm assuming you considered it but decided not to, what's the rationale?

Alas the sharing of concepts here is in the programming pattern -- the repeated code is interleaved with the rest. It's not yet clear (to me) how to extract that abstraction into a library.


pkg/kv/kvpb/api.proto line 3131 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/prevail/prevails/.

Fixed.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/BUILD.bazel line 4 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we could add testonly = 1, while we're here.

Nice TIL.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 58 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we switch the order of serviceMode and dataState here and in watcher_test.go to match the order of columns in the system table as well as in the Entry struct? Up to you.

Sure why not.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go line 92 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Why do we need length checks on datums here? My reading of the code says that Decoder.Decode should return a datum for each column in the table, and system.tenants has 6 columns.

Clarified in comment. This needs to work also with servers that haven't run the V23_1TenantNamesStateAndServiceMode upgrade migration yet, where these columns are added.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go line 102 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It's unexpected when rawDataState is outside of [0, mtinfopb.MaxDataState) range, right? Should we return an error or at least log a warning? The only valid scenario when this can happen I can think of is when a user manually inserts a row with these values into the system table, are there other cases?

Let's do a warning. Indeed, your understanding is right this is defensive programming.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go line 129 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the comment is now stale.

Fixed.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go line 158 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This call to getInternal increases the mutex contention in case when the entry is not found (previously we only had a single RLock, but now we'll have RLock followed by RWLock). Is this concerning?

The rate at which capabilities are updated is presumably very low (it's supposed to be a manual operation).

Also FYI this is the same pattern used in tenantsettingswatcher, see (s *overridesStore) GetTenantOverrides() in overrides_store.go.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 21 of 21 files at r4, 12 of 12 files at r5, 15 of 16 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/kv/kvpb/api.proto line 3099 at r6 (raw file):

  enum EventType {
    // The event is about an update to cluster setting overrides.
    // This must be zero for backward-compatibility with pre-v23.1

nit: should this be pre-v23.1? Or are you intending to backport this to 23.1?


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go line 92 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Clarified in comment. This needs to work also with servers that haven't run the V23_1TenantNamesStateAndServiceMode upgrade migration yet, where these columns are added.

Ack, makes sense.

@knz knz force-pushed the 20230623-tenant-watch branch from 52d9c5c to 2e3c106 Compare June 28, 2023 15:18
@knz
Copy link
Contributor Author

knz commented Jun 28, 2023

@yuzefovich could you have another look? Before I embark on the project to rename all the things.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I only have some minor questions / comments.

Reviewed 23 of 23 files at r7, 1 of 1 files at r8, 7 of 7 files at r9, 13 of 13 files at r10, 10 of 10 files at r11, 4 of 4 files at r12, 6 of 6 files at r13, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


-- commits line 102 at r11:
The protocol seems sound to me, but I feel like this deserves a mixed-version unit test, is it possible to add one?


pkg/kv/kvclient/kvtenant/BUILD.bazel line 25 at r13 (raw file):

        "//pkg/multitenant/mtinfopb",
        "//pkg/multitenant/tenantcapabilities",
        "//pkg/multitenant/tenantcapabilities/tenantcapabilitiespb",

When renaming things, WDYT about making tenantcapabilities shorter, perhaps tenantcaps? Or I guess with the renaming it'll be something like tenantinfo or tenantmetadata?


pkg/kv/kvclient/kvtenant/setting_overrides.go line 89 at r13 (raw file):

				// Signal that startup is complete once we have enough events
				// to start. Note: we do not connect this condition to
				// receiving the tenant medata (via processMetadataEvent) for

nit: s/medata/metadata/.


pkg/kv/kvpb/api.proto line 3139 at r11 (raw file):

  // The type of event. For backward-compatibility with servers that
  // do not check the event_type field, all events of other types

nit: accessing event_type field should be allowed once the cluster has been upgraded to 23.2, right? Should we leave a TODO or adjust the comment to mention this?


pkg/kv/kvpb/api.proto line 3202 at r11 (raw file):

  // ServiceMode is the tenant's current service mode.
  // TODO(knz): This should really be casted to go type mtinfopb.TenantServicemode but we

nit: s/Servicemode/ServiceMode/.


pkg/server/node.go line 2163 at r11 (raw file):

	// - For compatibility with older version tenant clients,
	//   all non-setting event types must fake being a no-op
	//   setting event (see the docstring on event_type below).

nit: there is no event_type below.


pkg/kv/kvclient/kvtenant/setting_overrides_test.go line 241 at r13 (raw file):

	}
	eventCh <- ev
	select {

What is the rationale for blocking for 10ms after each event? I'd assume we could just send as many as we want and only block once, after all have been sent, for up to 10s.

@knz knz force-pushed the 20230623-tenant-watch branch 2 times, most recently from c588664 to 754aa4a Compare June 29, 2023 21:38
@knz knz force-pushed the 20230623-tenant-watch branch from 754aa4a to 952e767 Compare July 7, 2023 17:20
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


-- commits line 102 at r11:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

The protocol seems sound to me, but I feel like this deserves a mixed-version unit test, is it possible to add one?

Done.


pkg/kv/kvclient/kvtenant/BUILD.bazel line 25 at r13 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

When renaming things, WDYT about making tenantcapabilities shorter, perhaps tenantcaps? Or I guess with the renaming it'll be something like tenantinfo or tenantmetadata?

Sounds good. I'll do this in a separate PR.


pkg/kv/kvclient/kvtenant/setting_overrides.go line 89 at r13 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/medata/metadata/.

Fixed.


pkg/kv/kvpb/api.proto line 3099 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should this be pre-v23.1? Or are you intending to backport this to 23.1?

yes this should remain backportable.


pkg/kv/kvpb/api.proto line 3139 at r11 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: accessing event_type field should be allowed once the cluster has been upgraded to 23.2, right? Should we leave a TODO or adjust the comment to mention this?

I don't understand your question?


pkg/kv/kvpb/api.proto line 3202 at r11 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/Servicemode/ServiceMode/.

Fixed.


pkg/server/node.go line 2163 at r11 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: there is no event_type below.

Fixed.


pkg/kv/kvclient/kvtenant/setting_overrides_test.go line 241 at r13 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

What is the rationale for blocking for 10ms after each event? I'd assume we could just send as many as we want and only block once, after all have been sent, for up to 10s.

I was following the pattern already established in the rest of this file. The purpose for this pattern is that it ensures the test becomes flaky if the connector mistakenly reports readiness too early.

@knz knz marked this pull request as ready for review July 7, 2023 17:25
@knz knz requested review from a team as code owners July 7, 2023 17:25
@knz knz requested a review from stevendanna July 7, 2023 17:25
@knz knz force-pushed the 20230623-tenant-watch branch from 952e767 to 5d0bcf2 Compare July 7, 2023 19:13
@knz knz requested a review from a team as a code owner July 7, 2023 19:13
@knz knz requested review from smg260 and renatolabs and removed request for a team July 7, 2023 19:13
@knz knz force-pushed the 20230623-tenant-watch branch 2 times, most recently from f51d1bc to 32eb147 Compare July 7, 2023 20:51
@knz knz force-pushed the 20230623-tenant-watch branch 2 times, most recently from 9cbca3d to 7760dc4 Compare July 11, 2023 16:18
@knz knz force-pushed the 20230623-tenant-watch branch from 7760dc4 to ced460c Compare July 11, 2023 17:39
craig bot pushed a commit that referenced this pull request Jul 23, 2023
106414: server: properly check existence of tenant record r=yuzefovich,stevendanna a=knz

PR extracted from #105441.

Informs #83650.
Part of #98431.
Epic: CRDB-26691

Prior to this patch, the API on the tenant info watcher would return
an "undefined" metadata payload if called prior to the tenant record
being created.

This was most visible in the following scenario:

1. new cluster starts. watcher+rangefeed start successfully (tenant
   table empty)
2. tenant client connects. At this time there is no metadata for
   its tenant ID, so the metadata payload is available but empty.
3. CREATE TENANT is executed for the new tenant.
4. only then (later) the rangefeed introduces the metadata
   into the cache.

This is insufficient for use by the KV tenant client RPCs: there we
only want to accept incoming requests from tenant clients after we
actually have seen metadata for them.

This patch improves the situation by checking whether the tenant
record is present before returning bogus data to the SQL tenant
client.

Simultaneously, we add error handling logic in the SQL tenant client
to handle this case gracefully.

In a mixed-version deployment (with and without this patch applied),
the following behaviors will occur if one starts the SQL tenant server
without a tenant record defined:


- Unpatched server: Late startup error in client with "database 1 does not exist".
- Patched server: Client loops forever, waiting for tenant record.

Behavior when the tenant record is created shortly *after* the SQL
tenant server starts up:

- Unpatched server: Inconsistent / non-deterministic behavior.
- Patched server: Clean startup of client after tenant record appears.



Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz force-pushed the 20230623-tenant-watch branch 2 times, most recently from a424f49 to 205370e Compare July 23, 2023 15:33
@knz knz requested a review from yuzefovich July 23, 2023 15:34
@knz knz force-pushed the 20230623-tenant-watch branch from 205370e to 9829038 Compare July 25, 2023 10:52
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the productive review iteration here. I like where we've ended here.

Comment on lines 303 to 307
var buf strings.Builder
fmt.Fprintf(&buf, "tid=%v name=%q data=%v service=%v caps=%+v",
info.TenantID, info.Name, info.DataState, info.ServiceMode, info.TenantCapabilities)

str := buf.String()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as is since this is nice if we ever have to extend it. But if you fancy, this could probably be a single fmt.Sprintf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@knz knz force-pushed the 20230623-tenant-watch branch from 8ebfacd to 28568e1 Compare July 25, 2023 14:21
@knz
Copy link
Contributor Author

knz commented Jul 25, 2023

TFYR!

bors r=stevendanna

@knz
Copy link
Contributor Author

knz commented Jul 25, 2023

CI is telling me somethin went wrong during the last rebase

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 25, 2023

Canceled.

@knz knz force-pushed the 20230623-tenant-watch branch from 28568e1 to 5a15d76 Compare July 25, 2023 19:10
knz added 5 commits July 26, 2023 02:49
The previous commit made the server send the metadata through the
TenantSettingEvent streaming RPC.

This commit makes the tenant connector receive and store the metadata.

Like in the previous commits, we acknowledge that the whole
infrastructure has become more generic than just "cluster settings"
and so the components involved are now misnamed. The appropriate
renaming will be performed in a later stage.

Release note: None
This commit introduces a new test to check that old-version tenant
clients can operate with a new-version KV server, and vice-versa.

Release note: None
This ensures that `WaitForTenantReadiness` doesn't have to wait for
the full closed timestamp interval.

Release note: None
@knz knz force-pushed the 20230623-tenant-watch branch from 5a15d76 to 2c7848b Compare July 26, 2023 00:52
@knz
Copy link
Contributor Author

knz commented Jul 26, 2023

bors r=stevendanna

@knz
Copy link
Contributor Author

knz commented Jul 26, 2023

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 26, 2023

Canceled.

@knz
Copy link
Contributor Author

knz commented Jul 26, 2023

bors r=yuzefovich,stevendanna

@craig craig bot merged commit 68985a2 into cockroachdb:master Jul 26, 2023
@craig
Copy link
Contributor

craig bot commented Jul 26, 2023

Build succeeded:

@knz knz deleted the 20230623-tenant-watch branch July 26, 2023 09:35
craig bot pushed a commit that referenced this pull request Jul 26, 2023
96144: server: honor and validate the service mode for SQL pods r=yuzefovich,stevendanna a=knz

Rebased on top of #105441.
Fixes #93145.
Fixes #83650.

Epic: CRDB-26691

TLDR: this makes SQL servers only accept to start if their data state
is "ready" and their deployment type (e.g. separate process) matches
the configured service mode in the tenant record.
Additionally, the SQL server spontaneously terminates if
their service mode or data state is not valid any more (e.g
as a result of DROP TENANT or ALTER TENANT STOP SERVICE).

----

Prior to this patch, there wasn't a good story about the lifecycle
of separate-process SQL servers ("SQL pods"):

- if a SQL server was started against a non-existent tenant,
  an obscure error would be raised (`database "[1]" does not exist`)
- if a SQL server was started while a tenant was being added
  (keyspace not yet valid), no check would be performed and
  data corruption could ensue.
- if the tenant record/keyspace was dropped while the SQL server was
  running, SQL clients would start encountering obscure errors.

This commit fixes the situation by checking the tenant metadata:

- once, during SQL server startup, at which point server startup
  is prevented if the service check fails;
- then, asynchronously, whenever the metadata is updated, such that
  any service check failure results in a graceful shutdown of the SQL
  service.

The check proper validates:
- that the tenant record exists;
- that the data state is "ready";
- that the configured service mode matches that requested by the SQL
  server.

Examples output upon error:

- non-existent tenant:
  ```
  tenant service check failed: missing record
  ```

- attempting to start separate-process server while
  tenant is running as shared-process:
  ```
  tenant service check failed: service mode check failed: expected external, record says shared
  ```

- after ALTER TENANT STOP SERVICE:
  ```
  tenant service check failed: service mode check failed: expected external, record says none
  ```

- after DROP TENANT:
  ```
  tenant service check failed: service mode check failed: expected external, record says dropping
  ```

Release note: None



107124: server: avoid deadlock when initing additional stores r=erikgrinaker a=tbg

We need to start node liveness before waiting for additional store init.

Otherwise, we can end up in a situation where each node is sitting on the
channel and nobody has started their liveness yet. The sender to the channel
will first have to get an Increment through KV, but if nobody acquires the
lease (since nobody's heartbeat loop is running), this will never happen.

In practice, *most of the time*, there is no deadlock because the lease
acquisition path performs a synchronous heartbeat to the own entry in most
cases (ignoring the fact that liveness hasn't been started yet). But there is
also another path where someone else's epoch needs to be incremented, and this
path also checks if the node itself is live - which it won't necessarily be
(liveness loop is not running yet).

Fixes #106706

Epic: None
Release note (bug fix): a rare (!) situation in which nodes would get stuck
during start-up was addressed. This is unlikely to have been encountered by
production users This is unlikely to have been encountered by users. If so, it
would manifest itself through a stack frame sitting on a select in
`waitForAdditionalStoreInit` for extended periods of time (i.e. minutes).


107216: sql: disallow schema changes for READ COMMITTED transactions r=michae2 a=rafiss

Due to how descriptor leasing works, schema changes are not safe in weaker isolation transactions. Until they are safe, we disallow them.

fixes #100143
Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: define an API to access tenant metadata from tenant servers
4 participants