-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: notify tenant servers of metadata changes #105441
Conversation
Note to self:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me. As always, thanks for great commit messages.
Reviewed 4 of 4 files at r1, 12 of 12 files at r2, 9 of 9 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
-- commits
line 2 at r1:
There is another log.Fatal
call in pkg/multitenant
directory - in decoder.go
, should that be fixed too?
-- commits
line 26 at r3:
nit: s/medata/metadata/
.
-- commits
line 44 at r3:
nit: I think you meant pkg/server/tenantsettingswatcher/watcher.go
.
-- commits
line 45 at r3:
This seems like an opportunity to extract a helper to avoid code duplication, and I'm assuming you considered it but decided not to, what's the rationale?
pkg/kv/kvpb/api.proto
line 3131 at r3 (raw file):
// Note that this field is advisory: the server may know of a more // recent (and different) set of capabilities, and server-side // capabilities check always prevail.
nit: s/prevail/prevails/
.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/BUILD.bazel
line 4 at r2 (raw file):
go_library( name = "tenantcapabilitiestestutils",
nit: we could add testonly = 1,
while we're here.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 58 at r2 (raw file):
) ( name *roachpb.TenantName, serviceMode mtinfopb.TenantServiceMode,
nit: should we switch the order of serviceMode
and dataState
here and in watcher_test.go
to match the order of columns in the system table as well as in the Entry
struct? Up to you.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go
line 92 at r2 (raw file):
var name roachpb.TenantName if len(datums) > 3 {
Why do we need length checks on datums
here? My reading of the code says that Decoder.Decode
should return a datum for each column in the table, and system.tenants
has 6 columns.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go
line 102 at r2 (raw file):
if i := datums[4]; i != tree.DNull { rawDataState := tree.MustBeDInt(i) if rawDataState >= 0 && rawDataState <= tree.DInt(mtinfopb.MaxDataState) {
It's unexpected when rawDataState
is outside of [0, mtinfopb.MaxDataState)
range, right? Should we return an error or at least log a warning? The only valid scenario when this can happen I can think of is when a user manually inserts a row with these values into the system table, are there other cases?
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go
line 129 at r2 (raw file):
} // capabilityEntrySize is an estimate for a (tenantID, capability) pair that the
nit: the comment is now stale.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go
line 158 at r3 (raw file):
id roachpb.TenantID, ) (*tenantcapabilitiespb.TenantCapabilities, bool) { cp := w.getInternal(id)
This call to getInternal
increases the mutex contention in case when the entry is not found (previously we only had a single RLock, but now we'll have RLock followed by RWLock). Is this concerning?
a5bdedc
to
52d9c5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
There is another
log.Fatal
call inpkg/multitenant
directory - indecoder.go
, should that be fixed too?
Good idea. Done.
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/medata/metadata/
.
Fixed.
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think you meant
pkg/server/tenantsettingswatcher/watcher.go
.
Indeed. Fixed.
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This seems like an opportunity to extract a helper to avoid code duplication, and I'm assuming you considered it but decided not to, what's the rationale?
Alas the sharing of concepts here is in the programming pattern -- the repeated code is interleaved with the rest. It's not yet clear (to me) how to extract that abstraction into a library.
pkg/kv/kvpb/api.proto
line 3131 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/prevail/prevails/
.
Fixed.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/BUILD.bazel
line 4 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we could add
testonly = 1,
while we're here.
Nice TIL.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 58 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: should we switch the order of
serviceMode
anddataState
here and inwatcher_test.go
to match the order of columns in the system table as well as in theEntry
struct? Up to you.
Sure why not.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go
line 92 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Why do we need length checks on
datums
here? My reading of the code says thatDecoder.Decode
should return a datum for each column in the table, andsystem.tenants
has 6 columns.
Clarified in comment. This needs to work also with servers that haven't run the V23_1TenantNamesStateAndServiceMode upgrade migration yet, where these columns are added.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go
line 102 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
It's unexpected when
rawDataState
is outside of[0, mtinfopb.MaxDataState)
range, right? Should we return an error or at least log a warning? The only valid scenario when this can happen I can think of is when a user manually inserts a row with these values into the system table, are there other cases?
Let's do a warning. Indeed, your understanding is right this is defensive programming.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go
line 129 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: the comment is now stale.
Fixed.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher.go
line 158 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This call to
getInternal
increases the mutex contention in case when the entry is not found (previously we only had a single RLock, but now we'll have RLock followed by RWLock). Is this concerning?
The rate at which capabilities are updated is presumably very low (it's supposed to be a manual operation).
Also FYI this is the same pattern used in tenantsettingswatcher
, see (s *overridesStore) GetTenantOverrides()
in overrides_store.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 21 of 21 files at r4, 12 of 12 files at r5, 15 of 16 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/kv/kvpb/api.proto
line 3099 at r6 (raw file):
enum EventType { // The event is about an update to cluster setting overrides. // This must be zero for backward-compatibility with pre-v23.1
nit: should this be pre-v23.1
? Or are you intending to backport this to 23.1?
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/decoder.go
line 92 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Clarified in comment. This needs to work also with servers that haven't run the V23_1TenantNamesStateAndServiceMode upgrade migration yet, where these columns are added.
Ack, makes sense.
52d9c5c
to
2e3c106
Compare
@yuzefovich could you have another look? Before I embark on the project to rename all the things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I only have some minor questions / comments.
Reviewed 23 of 23 files at r7, 1 of 1 files at r8, 7 of 7 files at r9, 13 of 13 files at r10, 10 of 10 files at r11, 4 of 4 files at r12, 6 of 6 files at r13, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
-- commits
line 102 at r11:
The protocol seems sound to me, but I feel like this deserves a mixed-version unit test, is it possible to add one?
pkg/kv/kvclient/kvtenant/BUILD.bazel
line 25 at r13 (raw file):
"//pkg/multitenant/mtinfopb", "//pkg/multitenant/tenantcapabilities", "//pkg/multitenant/tenantcapabilities/tenantcapabilitiespb",
When renaming things, WDYT about making tenantcapabilities
shorter, perhaps tenantcaps
? Or I guess with the renaming it'll be something like tenantinfo
or tenantmetadata
?
pkg/kv/kvclient/kvtenant/setting_overrides.go
line 89 at r13 (raw file):
// Signal that startup is complete once we have enough events // to start. Note: we do not connect this condition to // receiving the tenant medata (via processMetadataEvent) for
nit: s/medata/metadata/
.
pkg/kv/kvpb/api.proto
line 3139 at r11 (raw file):
// The type of event. For backward-compatibility with servers that // do not check the event_type field, all events of other types
nit: accessing event_type
field should be allowed once the cluster has been upgraded to 23.2, right? Should we leave a TODO or adjust the comment to mention this?
pkg/kv/kvpb/api.proto
line 3202 at r11 (raw file):
// ServiceMode is the tenant's current service mode. // TODO(knz): This should really be casted to go type mtinfopb.TenantServicemode but we
nit: s/Servicemode/ServiceMode/
.
pkg/server/node.go
line 2163 at r11 (raw file):
// - For compatibility with older version tenant clients, // all non-setting event types must fake being a no-op // setting event (see the docstring on event_type below).
nit: there is no event_type
below.
pkg/kv/kvclient/kvtenant/setting_overrides_test.go
line 241 at r13 (raw file):
} eventCh <- ev select {
What is the rationale for blocking for 10ms after each event? I'd assume we could just send as many as we want and only block once, after all have been sent, for up to 10s.
c588664
to
754aa4a
Compare
754aa4a
to
952e767
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
The protocol seems sound to me, but I feel like this deserves a mixed-version unit test, is it possible to add one?
Done.
pkg/kv/kvclient/kvtenant/BUILD.bazel
line 25 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
When renaming things, WDYT about making
tenantcapabilities
shorter, perhapstenantcaps
? Or I guess with the renaming it'll be something liketenantinfo
ortenantmetadata
?
Sounds good. I'll do this in a separate PR.
pkg/kv/kvclient/kvtenant/setting_overrides.go
line 89 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/medata/metadata/
.
Fixed.
pkg/kv/kvpb/api.proto
line 3099 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: should this be
pre-v23.1
? Or are you intending to backport this to 23.1?
yes this should remain backportable.
pkg/kv/kvpb/api.proto
line 3139 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: accessing
event_type
field should be allowed once the cluster has been upgraded to 23.2, right? Should we leave a TODO or adjust the comment to mention this?
I don't understand your question?
pkg/kv/kvpb/api.proto
line 3202 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/Servicemode/ServiceMode/
.
Fixed.
pkg/server/node.go
line 2163 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: there is no
event_type
below.
Fixed.
pkg/kv/kvclient/kvtenant/setting_overrides_test.go
line 241 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
What is the rationale for blocking for 10ms after each event? I'd assume we could just send as many as we want and only block once, after all have been sent, for up to 10s.
I was following the pattern already established in the rest of this file. The purpose for this pattern is that it ensures the test becomes flaky if the connector mistakenly reports readiness too early.
952e767
to
5d0bcf2
Compare
f51d1bc
to
32eb147
Compare
9cbca3d
to
7760dc4
Compare
7760dc4
to
ced460c
Compare
106414: server: properly check existence of tenant record r=yuzefovich,stevendanna a=knz PR extracted from #105441. Informs #83650. Part of #98431. Epic: CRDB-26691 Prior to this patch, the API on the tenant info watcher would return an "undefined" metadata payload if called prior to the tenant record being created. This was most visible in the following scenario: 1. new cluster starts. watcher+rangefeed start successfully (tenant table empty) 2. tenant client connects. At this time there is no metadata for its tenant ID, so the metadata payload is available but empty. 3. CREATE TENANT is executed for the new tenant. 4. only then (later) the rangefeed introduces the metadata into the cache. This is insufficient for use by the KV tenant client RPCs: there we only want to accept incoming requests from tenant clients after we actually have seen metadata for them. This patch improves the situation by checking whether the tenant record is present before returning bogus data to the SQL tenant client. Simultaneously, we add error handling logic in the SQL tenant client to handle this case gracefully. In a mixed-version deployment (with and without this patch applied), the following behaviors will occur if one starts the SQL tenant server without a tenant record defined: - Unpatched server: Late startup error in client with "database 1 does not exist". - Patched server: Client loops forever, waiting for tenant record. Behavior when the tenant record is created shortly *after* the SQL tenant server starts up: - Unpatched server: Inconsistent / non-deterministic behavior. - Patched server: Clean startup of client after tenant record appears. Co-authored-by: Raphael 'kena' Poss <[email protected]>
a424f49
to
205370e
Compare
205370e
to
9829038
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the productive review iteration here. I like where we've ended here.
var buf strings.Builder | ||
fmt.Fprintf(&buf, "tid=%v name=%q data=%v service=%v caps=%+v", | ||
info.TenantID, info.Name, info.DataState, info.ServiceMode, info.TenantCapabilities) | ||
|
||
str := buf.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as is since this is nice if we ever have to extend it. But if you fancy, this could probably be a single fmt.Sprintf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
8ebfacd
to
28568e1
Compare
TFYR! bors r=stevendanna |
CI is telling me somethin went wrong during the last rebase bors r- |
Canceled. |
28568e1
to
5a15d76
Compare
Release note: None
The previous commit made the server send the metadata through the TenantSettingEvent streaming RPC. This commit makes the tenant connector receive and store the metadata. Like in the previous commits, we acknowledge that the whole infrastructure has become more generic than just "cluster settings" and so the components involved are now misnamed. The appropriate renaming will be performed in a later stage. Release note: None
This commit introduces a new test to check that old-version tenant clients can operate with a new-version KV server, and vice-versa. Release note: None
Release note: None
This ensures that `WaitForTenantReadiness` doesn't have to wait for the full closed timestamp interval. Release note: None
5a15d76
to
2c7848b
Compare
bors r=stevendanna |
bors r- |
Canceled. |
bors r=yuzefovich,stevendanna |
Build succeeded: |
96144: server: honor and validate the service mode for SQL pods r=yuzefovich,stevendanna a=knz Rebased on top of #105441. Fixes #93145. Fixes #83650. Epic: CRDB-26691 TLDR: this makes SQL servers only accept to start if their data state is "ready" and their deployment type (e.g. separate process) matches the configured service mode in the tenant record. Additionally, the SQL server spontaneously terminates if their service mode or data state is not valid any more (e.g as a result of DROP TENANT or ALTER TENANT STOP SERVICE). ---- Prior to this patch, there wasn't a good story about the lifecycle of separate-process SQL servers ("SQL pods"): - if a SQL server was started against a non-existent tenant, an obscure error would be raised (`database "[1]" does not exist`) - if a SQL server was started while a tenant was being added (keyspace not yet valid), no check would be performed and data corruption could ensue. - if the tenant record/keyspace was dropped while the SQL server was running, SQL clients would start encountering obscure errors. This commit fixes the situation by checking the tenant metadata: - once, during SQL server startup, at which point server startup is prevented if the service check fails; - then, asynchronously, whenever the metadata is updated, such that any service check failure results in a graceful shutdown of the SQL service. The check proper validates: - that the tenant record exists; - that the data state is "ready"; - that the configured service mode matches that requested by the SQL server. Examples output upon error: - non-existent tenant: ``` tenant service check failed: missing record ``` - attempting to start separate-process server while tenant is running as shared-process: ``` tenant service check failed: service mode check failed: expected external, record says shared ``` - after ALTER TENANT STOP SERVICE: ``` tenant service check failed: service mode check failed: expected external, record says none ``` - after DROP TENANT: ``` tenant service check failed: service mode check failed: expected external, record says dropping ``` Release note: None 107124: server: avoid deadlock when initing additional stores r=erikgrinaker a=tbg We need to start node liveness before waiting for additional store init. Otherwise, we can end up in a situation where each node is sitting on the channel and nobody has started their liveness yet. The sender to the channel will first have to get an Increment through KV, but if nobody acquires the lease (since nobody's heartbeat loop is running), this will never happen. In practice, *most of the time*, there is no deadlock because the lease acquisition path performs a synchronous heartbeat to the own entry in most cases (ignoring the fact that liveness hasn't been started yet). But there is also another path where someone else's epoch needs to be incremented, and this path also checks if the node itself is live - which it won't necessarily be (liveness loop is not running yet). Fixes #106706 Epic: None Release note (bug fix): a rare (!) situation in which nodes would get stuck during start-up was addressed. This is unlikely to have been encountered by production users This is unlikely to have been encountered by users. If so, it would manifest itself through a stack frame sitting on a select in `waitForAdditionalStoreInit` for extended periods of time (i.e. minutes). 107216: sql: disallow schema changes for READ COMMITTED transactions r=michae2 a=rafiss Due to how descriptor leasing works, schema changes are not safe in weaker isolation transactions. Until they are safe, we disallow them. fixes #100143 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
Informs (towards fixing) #98431.
Fixes #105721.
Epic: CRDB-26691
See the individual commits for details.