sql,*: make some or all system tables LOCALITY GLOBAL #63365

ajwerner · 2021-04-09T03:44:49Z

Is your feature request related to a problem? Please describe.

This idea has come up a few times recently and it seems worthwhile to centralize the discussion somewhere. Most recently #36160 (comment). It occurs to me that several other big problems towards which we've considered investing considerable engineering efforts could also be mitigated or solved.

Today's virtual tables are powered by an in-memory cache of all descriptors. The latency requirements to evict from such a cache makes it infeasible. If the data were local and low-latency, then it's plausible to implement these tables in a streaming fashion. This memory overhead today has not been much of a concern given other bottlenecks which generally make creating a schema of a problematic size unlikely (#63206).

Another consideration which only just occurred to me is the commit-to-emit latency of CHANGEFEEDs. The dominant source of latency in CHANGEFEED is waiting to "prove" the schema for a row (#36289). In the past we have explored leasing protocols by which changefeeds might coordinate with / hold off schema changes and thus be free to emit rows so long as they have a lease. This approach was demonstrated to work and is, on some level, viable. However, it's far from trivial and would even further complicate transactional schema changes. If the system.descriptor table were a global table, a resolved timestamp corresponding to the present could be emitted to each node hosting a CHANGEFEED around the time that rows are written. This does reveal another interesting problem that CHANGEFEEDs are going to need to deal with is that rows committed in the future due to being part of a transaction touching a global table are likely to block rows due to non-global tables. That can be mitigated using some buffering.

Describe the solution you'd like

The table I'm most interested in making global is system.descriptor. This, today, would mean making the whole system config span global. It seems plausible one day to break up that whole concept once we have a new zone configuration architecture.

One thing I haven't thought through is what happens to the system.lease table and its relevant protocols if the leasing transaction needs to interact with writes which carried synthetic timestamps.

Jira issue: CRDB-6547

Epic CRDB-33032

The text was updated successfully, but these errors were encountered:

rafiss · 2021-05-26T21:19:05Z

@arulajmani could you drop in a comment about your idea for how we might do this manually with a zone config as a stop-gap until we have true GLOBAL system tables?

arulajmani · 2021-05-27T13:47:39Z

Yeah -- GLOBAL tables are ~just syntactic sugar around the global_reads zone config attribute. My suggestion was to try and turn this on for the authentication tables you were interested in and see if that works as a stop gap until we figure out the details around making the system database mulit-region.

ajwerner · 2021-05-27T15:35:04Z

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

However, this raises something extra interesting, we are already propagating all of the hash passwords via gossip and we didn't even realize! That being said, they'd be the stale passwords and we don't know when they'd be flushed.

Making the whole system config span global is likely to have very large impacts on the performance of a lot of schema changes. I don't know if we can stomach that.

ajwerner · 2021-05-27T15:35:53Z

All this being said, if we finish the new zone config work such that we don't need to gossip the system config span, then we could configure these tables independently. I think having users and user_role_options as global tables would be great.

arulajmani · 2021-05-27T19:01:28Z

Unfortunately (and this is a big unfortunately), system.users is in the system config span.

Extra unfortunately, its hard-code table ID is between descriptors and zones.

Ughgh I didn't know this, this is very unfortunate indeed. Never mind my suggestion @rafiss, I'm not sure if the zone config work getting finished fits with your timeline for making system.users and system.role_options accesses faster.

knz · 2021-06-04T19:09:58Z

Here are other categories of things that would benefit from global locality:

the meta1 ranges (and meta2 when we add them)
the system tables needed for user log in: system.users, system.role_members, system.web_sessions, system.role_options.
system.localities

knz · 2021-06-04T19:12:59Z

@bobvawter asks:

We had a customer outage last night where a storage array oops in one region brought down the entire cluster. Would globalized system ranges have allowed the cluster to at least continue to serve reads?

Steven Hand asks:

I have had a couple customers for which access to system ranges was a major problem. One workaround was to (temporarily) pin system ranges to one region (at the cost of access from other regions). The other workaround was to enhance the Ruby Active Record ORM adapter to support stale reads for such system metadata.

rafiss · 2021-07-29T15:43:36Z

Related question from the community: #67109

rafiss · 2022-03-30T16:54:27Z

Here's an example that came up in the context of Prisma (and also would affect most other users of user-defined types): prisma/prisma#11317 (comment)

ajwerner · 2022-03-30T17:01:11Z

loosely relates to 79043

irfansharif · 2022-05-11T18:40:32Z

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

BTW this is trivial now in 22.2, if we want to pull on this thread further. We're no longer gossiping the system config span and there's no reason to not split within it. To do so, I think all you need is deleting these lines + updating a few test files:

cockroach/pkg/spanconfig/spanconfigstore/span_store.go

Lines 113 to 123 in 4124128

    
           // We don't want to split within the system config span while we're still 
        
           // also using it to disseminate zone configs. 
        
           // 
        
           // TODO(irfansharif): Once we've fully phased out the system config span, we 
        
           // can get rid of this special handling. 
        
           if keys.SystemConfigSpan.Contains(sp) { 
        
           	return nil, nil 
        
           } 
        
           if keys.SystemConfigSpan.ContainsKey(sp.Key) { 
        
           	return roachpb.RKey(keys.SystemConfigSpan.EndKey), nil 
        
           }

andy-kimball · 2022-06-01T18:45:34Z

@irfansharif, I'm really happy to hear that we're unblocked due to the zone config work. The architectural improvements that came along with that work continue to pay dividends.

Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE PRIMARY REGION to work on system tenants. Fixes: cockroachdb#63365 Epic: CRDB-33032 Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants. To support preview status, we created a cluster setting called sql.multiregion.preview_multiregion_system_database that will give users the option to set up their system database as multi-region for Cockroach dedicated (this cluster setting is not enabled by default). Note that after adding non-primary regions, we recommend that users do a rolling restart to propogate region information.

121245: storage: support disk stall tracing r=aadityasondhi a=CheranMahalingam Currently, in the event of a disk stall we don't have visibility into the sequence of disk events that led up to the failure due to the 10s frequency at which we export disk metrics. This commit adds support for storing a history of disk events from the previous 30s and in the event of a stall, logs a trace. Fixes: #120506. Informs: #89786. Epic: None. Release note: None. 121293: sql: allow ALTER DATABASE PRIMARY REGION to work on system tenants r=jasminejsun a=jasminejsun Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE PRIMARY REGION to work on system tenants. Fixes: #63365 Epic: CRDB-33032 Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants. To support preview status, we created a cluster setting called sql.multiregion.preview_multiregion_system_database that will give users the option to set up their system database as multi-region for Cockroach dedicated (this cluster setting is not enabled by default). Note that after adding non-primary regions, we recommend that users do a rolling restart to propogate region information. 121860: roachtest: simpler preempted instance names r=herkolategan a=renatolabs Small change to make them easier for a person to read. Epic: none Release note: None 121903: sql: avoid an allocation in checkExprForDistSQL r=yuzefovich a=yuzefovich This commit removes an allocation of `distSQLExprCheckVisitor` that previously happened on each `checkExprForDistSQL` call. We now reuse the same visitor that is stored on the `planner`. In some profiles I looked at recently this allocation accounted for over 1% of all allocations. Fixes: #121302. Release note: None Co-authored-by: Cheran Mahalingam <[email protected]> Co-authored-by: Jasmine Sun <[email protected]> Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

122285: sql,multiregionccl: rename multiregion system DB cluster setting r=rafiss a=rafiss Avoid using the word "preview" in the name. That shouldn't be needed, since the cluster setting is hidden, so users would only know about it if we tell them to enable it. Keeping "preview" in the name isn't a huge problem, but it's just slightly annoying if we decide that we want to keep the setting around even after the feature leaves the "preview" status, since it requires updating docs and any customer code that was using the old name. Also, this removes the mention of the setting in error messages. An error should not refer to an undocumented setting, since a user would have no way of learning more about it. informs: #63365 Epic: CRDB-33032 Release note: None Co-authored-by: Rafi Shamim <[email protected]>

ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 9, 2021

ajwerner mentioned this issue Apr 9, 2021

pgwire: verifying user passwords is too aggressive #36160

Closed

rafiss mentioned this issue Apr 29, 2021

sql/kv: use duplicate index pattern for system tables #45225

Closed

rafiss added the O-postmortem Originated from a Postmortem action item. label May 12, 2021

mlazowik mentioned this issue May 18, 2021

Clarify which parts of 21.1 multi-region are enterprise cockroachdb/docs#10601

Closed

awoods187 added A-multiregion Related to multi-region T-multiregion labels May 20, 2021

awoods187 mentioned this issue May 20, 2021

sql: allow system tables to be regional by row tables #65536

Closed

exalate-issue-sync bot added T-sql-schema-deprecated Use T-sql-foundations instead and removed T-multiregion labels Jun 16, 2021

rafiss mentioned this issue Mar 31, 2022

CockroachDB: Highly Variable Query Response Times prisma/prisma#11317

Closed

irfansharif mentioned this issue May 11, 2022

spanconfig: miscellaneous improvements/TODOs #81009

Open

13 tasks

jlinder added sync-me-3 and removed sync-me-3 labels May 24, 2022

jasminejsun mentioned this issue Mar 28, 2024

sql: allow ALTER DATABASE PRIMARY REGION to work on system tenants #121293

Merged

craig bot closed this as completed in 6f3cae0 Apr 8, 2024

rafiss mentioned this issue Apr 24, 2024

sql,multiregionccl: rename multiregion system DB cluster setting #122285

Merged

rafiss mentioned this issue May 31, 2024

sqlclustersettings: make sql.multiregion.system_database_multiregion.enabled public #124944

Open

rafiss mentioned this issue Jul 2, 2024

sql/multi-region: warn user with the system db zone config is different than user db #126015

Open

github-project-automation bot added this to SQL Foundations Aug 28, 2024

github-project-automation bot moved this to Done in SQL Foundations Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql,*: make some or all system tables LOCALITY GLOBAL #63365

sql,*: make some or all system tables LOCALITY GLOBAL #63365

ajwerner commented Apr 9, 2021 •

edited by exalate-issue-sync bot

Loading

rafiss commented May 26, 2021

arulajmani commented May 27, 2021

ajwerner commented May 27, 2021

ajwerner commented May 27, 2021

arulajmani commented May 27, 2021

knz commented Jun 4, 2021

knz commented Jun 4, 2021

rafiss commented Jul 29, 2021

rafiss commented Mar 30, 2022

ajwerner commented Mar 30, 2022

irfansharif commented May 11, 2022

andy-kimball commented Jun 1, 2022

sql,*: make some or all system tables LOCALITY GLOBAL #63365

sql,*: make some or all system tables LOCALITY GLOBAL #63365

Comments

ajwerner commented Apr 9, 2021 • edited by exalate-issue-sync bot Loading

rafiss commented May 26, 2021

arulajmani commented May 27, 2021

ajwerner commented May 27, 2021

ajwerner commented May 27, 2021

arulajmani commented May 27, 2021

knz commented Jun 4, 2021

knz commented Jun 4, 2021

rafiss commented Jul 29, 2021

rafiss commented Mar 30, 2022

ajwerner commented Mar 30, 2022

irfansharif commented May 11, 2022

andy-kimball commented Jun 1, 2022

ajwerner commented Apr 9, 2021 •

edited by exalate-issue-sync bot

Loading