Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,*: make some or all system tables LOCALITY GLOBAL #63365

Closed
Tracked by #81009
ajwerner opened this issue Apr 9, 2021 · 14 comments · Fixed by #121293
Closed
Tracked by #81009

sql,*: make some or all system tables LOCALITY GLOBAL #63365

ajwerner opened this issue Apr 9, 2021 · 14 comments · Fixed by #121293
Assignees
Labels
A-multiregion Related to multi-region C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Apr 9, 2021

Is your feature request related to a problem? Please describe.

This idea has come up a few times recently and it seems worthwhile to centralize the discussion somewhere. Most recently #36160 (comment). It occurs to me that several other big problems towards which we've considered investing considerable engineering efforts could also be mitigated or solved.

Today's virtual tables are powered by an in-memory cache of all descriptors. The latency requirements to evict from such a cache makes it infeasible. If the data were local and low-latency, then it's plausible to implement these tables in a streaming fashion. This memory overhead today has not been much of a concern given other bottlenecks which generally make creating a schema of a problematic size unlikely (#63206).

Another consideration which only just occurred to me is the commit-to-emit latency of CHANGEFEEDs. The dominant source of latency in CHANGEFEED is waiting to "prove" the schema for a row (#36289). In the past we have explored leasing protocols by which changefeeds might coordinate with / hold off schema changes and thus be free to emit rows so long as they have a lease. This approach was demonstrated to work and is, on some level, viable. However, it's far from trivial and would even further complicate transactional schema changes. If the system.descriptor table were a global table, a resolved timestamp corresponding to the present could be emitted to each node hosting a CHANGEFEED around the time that rows are written. This does reveal another interesting problem that CHANGEFEEDs are going to need to deal with is that rows committed in the future due to being part of a transaction touching a global table are likely to block rows due to non-global tables. That can be mitigated using some buffering.

Describe the solution you'd like

The table I'm most interested in making global is system.descriptor. This, today, would mean making the whole system config span global. It seems plausible one day to break up that whole concept once we have a new zone configuration architecture.

One thing I haven't thought through is what happens to the system.lease table and its relevant protocols if the leasing transaction needs to interact with writes which carried synthetic timestamps.

Jira issue: CRDB-6547

Epic CRDB-33032

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 9, 2021
@rafiss rafiss added the O-postmortem Originated from a Postmortem action item. label May 12, 2021
@awoods187 awoods187 added A-multiregion Related to multi-region T-multiregion labels May 20, 2021
@rafiss
Copy link
Collaborator

rafiss commented May 26, 2021

@arulajmani could you drop in a comment about your idea for how we might do this manually with a zone config as a stop-gap until we have true GLOBAL system tables?

@arulajmani
Copy link
Collaborator

Yeah -- GLOBAL tables are ~just syntactic sugar around the global_reads zone config attribute. My suggestion was to try and turn this on for the authentication tables you were interested in and see if that works as a stop gap until we figure out the details around making the system database mulit-region.

@ajwerner
Copy link
Contributor Author

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

However, this raises something extra interesting, we are already propagating all of the hash passwords via gossip and we didn't even realize! That being said, they'd be the stale passwords and we don't know when they'd be flushed.

Making the whole system config span global is likely to have very large impacts on the performance of a lot of schema changes. I don't know if we can stomach that.

@ajwerner
Copy link
Contributor Author

All this being said, if we finish the new zone config work such that we don't need to gossip the system config span, then we could configure these tables independently. I think having users and user_role_options as global tables would be great.

@arulajmani
Copy link
Collaborator

Unfortunately (and this is a big unfortunately), system.users is in the system config span.

Extra unfortunately, its hard-code table ID is between descriptors and zones.

Ughgh I didn't know this, this is very unfortunate indeed. Never mind my suggestion @rafiss, I'm not sure if the zone config work getting finished fits with your timeline for making system.users and system.role_options accesses faster.

@knz
Copy link
Contributor

knz commented Jun 4, 2021

Here are other categories of things that would benefit from global locality:

  • the meta1 ranges (and meta2 when we add them)
  • the system tables needed for user log in: system.users, system.role_members, system.web_sessions, system.role_options.
  • system.localities

@knz
Copy link
Contributor

knz commented Jun 4, 2021

@bobvawter asks:

We had a customer outage last night where a storage array oops in one region brought down the entire cluster. Would globalized system ranges have allowed the cluster to at least continue to serve reads?

Steven Hand asks:

I have had a couple customers for which access to system ranges was a major problem. One workaround was to (temporarily) pin system ranges to one region (at the cost of access from other regions). The other workaround was to enhance the Ruby Active Record ORM adapter to support stale reads for such system metadata.

@exalate-issue-sync exalate-issue-sync bot added T-sql-schema-deprecated Use T-sql-foundations instead and removed T-multiregion labels Jun 16, 2021
@rafiss
Copy link
Collaborator

rafiss commented Jul 29, 2021

Related question from the community: #67109

@rafiss
Copy link
Collaborator

rafiss commented Mar 30, 2022

Here's an example that came up in the context of Prisma (and also would affect most other users of user-defined types): prisma/prisma#11317 (comment)

@ajwerner
Copy link
Contributor Author

loosely relates to 79043

@irfansharif
Copy link
Contributor

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

BTW this is trivial now in 22.2, if we want to pull on this thread further. We're no longer gossiping the system config span and there's no reason to not split within it. To do so, I think all you need is deleting these lines + updating a few test files:

// We don't want to split within the system config span while we're still
// also using it to disseminate zone configs.
//
// TODO(irfansharif): Once we've fully phased out the system config span, we
// can get rid of this special handling.
if keys.SystemConfigSpan.Contains(sp) {
return nil, nil
}
if keys.SystemConfigSpan.ContainsKey(sp.Key) {
return roachpb.RKey(keys.SystemConfigSpan.EndKey), nil
}

@andy-kimball
Copy link
Contributor

@irfansharif, I'm really happy to hear that we're unblocked due to the zone config work. The architectural improvements that came along with that work continue to pay dividends.

jasminejsun added a commit to jasminejsun/cockroach that referenced this issue Apr 1, 2024
Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE
PRIMARY REGION to work on system tenants.

Fixes: cockroachdb#63365
Epic: CRDB-33032

Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants.
To support preview status, we created a cluster setting called sql.multiregion.preview_multiregion_system_database that will give users the option to set up their system database as multi-region for Cockroach dedicated
(this cluster setting is not enabled by default). Note that after adding non-primary regions, we recommend that users do a rolling restart to propogate region information.
jasminejsun added a commit to jasminejsun/cockroach that referenced this issue Apr 3, 2024
Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE
PRIMARY REGION to work on system tenants.

Fixes: cockroachdb#63365
Epic: CRDB-33032

Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants.
To support preview status, we created a cluster setting called sql.multiregion.preview_multiregion_system_database that will give users the option to set up their system database as multi-region for Cockroach dedicated
(this cluster setting is not enabled by default). Note that after adding non-primary regions, we recommend that users do a rolling restart to propogate region information.
craig bot pushed a commit that referenced this issue Apr 8, 2024
121245: storage: support disk stall tracing r=aadityasondhi a=CheranMahalingam

Currently, in the event of a disk stall we don't have visibility into the sequence of disk events that led up to the failure due to the 10s frequency at which we export disk metrics. This commit adds support for storing a history of disk events from the previous 30s and in the event of a stall, logs a trace.

Fixes: #120506.
Informs: #89786.

Epic: None.
Release note: None.

121293: sql: allow ALTER DATABASE PRIMARY REGION to work on system tenants r=jasminejsun a=jasminejsun

Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE PRIMARY REGION to work on system tenants.

Fixes: #63365
Epic: CRDB-33032

Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants. To support preview status, we created a cluster setting called sql.multiregion.preview_multiregion_system_database that will give users the option to set up their system database as multi-region for Cockroach dedicated (this cluster setting is not enabled by default). Note that after adding non-primary regions, we recommend that users do a rolling restart to propogate region information.

121860: roachtest: simpler preempted instance names r=herkolategan a=renatolabs

Small change to make them easier for a person to read.

Epic: none

Release note: None

121903: sql: avoid an allocation in checkExprForDistSQL r=yuzefovich a=yuzefovich

This commit removes an allocation of `distSQLExprCheckVisitor` that previously happened on each `checkExprForDistSQL` call. We now reuse the same visitor that is stored on the `planner`. In some profiles I looked at recently this allocation accounted for over 1% of all allocations.

Fixes: #121302.

Release note: None

Co-authored-by: Cheran Mahalingam <[email protected]>
Co-authored-by: Jasmine Sun <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in 6f3cae0 Apr 8, 2024
craig bot pushed a commit that referenced this issue May 8, 2024
122285: sql,multiregionccl: rename multiregion system DB cluster setting r=rafiss a=rafiss

Avoid using the word "preview" in the name. That shouldn't be needed, since the cluster setting is hidden, so users would only know about it if we tell them to enable it. Keeping "preview" in the name isn't a huge problem, but it's just slightly annoying if we decide that we want to keep the setting around even after the feature leaves the "preview" status, since it requires updating docs and any customer code that was using the old name.

Also, this removes the mention of the setting in error messages. An error should not refer to an undocumented setting, since a user would have no way of learning more about it.

informs: #63365
Epic: CRDB-33032
Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multiregion Related to multi-region C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Status: Done