-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818
Comments
@knz here's a first take on the |
oh sorry I hadn't seen that issue. thanks |
So this is the overarching issue, which we have decided to split into 3 sub-projects:
|
66600: sql: Store instance ID and network addresses of SQL pods r=rimadeodhar a=rimadeodhar This change adds a new system table, sql_instances, for storing SQL instance information in a multi-tenant environment. This table will be used to store and propagate unique instance ids and network address information on individual SQL pods. This change also implements a storage layer for accessing the sql_instances system table using the KV API. We need to use KV API as this table will be accessed and updated prior to the construction of a SQL server for the tenant. It also implements the provider interface for using this new sqlinstance subsystem to assign unique instance ids per pod and propagate their network information. This is a part of #62818, resolves #64475 Release Note (general change): Add sqlinstance subsystem for creating unique instance ids per SQL pod and for propagating network address for SQL pods within a single tenant. 67281: clusterversion: set `binaryMinSupportedVersion` to `v21_1` r=ajwerner a=ajwerner This probably should have been done when `Start21_2` was added. Fixes #67277. Release note: None 68280: SQL: Reset database zone configuration builtin r=arulajmani a=ajstorm The new(ish) multi-region syntax introduced in 21.1 sets a zone configuration at the database level. This zone configuration can diverge from its original state either because the user has manually overridden it, or because the system hasn't kept it up to date as the database has evolved (though there are no known issues where this occurs). In either case, the user may wish to revert the zone configurations for a given databases back to its original state. The new builtin takes a database ID, and resets its underlying zone configuration to its original state (as specified by the underlying RegionConfig). For non-multi-region databases, the builtin no-ops. Resolves #66855. Release note (sql change): Creates a builtin to reset the zone configuration of a multi-region database. This builtin can be helpful in cases where the user has overridden the zone configuration for a given database and wishes to revert back to the original system-specified state. 68400: ci: enlarge boot disks for `roachtest` clusters from 10GiB->32GiB r=jlinder a=rickystewart These tests are sometimes failing due to running out of storage space on the `roachprod` clusters used by `roachtest` (see #68166). Try to resolve it by bumping up the size of the boot disks. Release note: None Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Adam Storm <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Completed. |
This is an epic (meta-issue)
Sub-issues:
(*statusServer).iterateNodes()
with a newiterateSQLPods()
#64477Is your feature request related to a problem? Please describe.
Relates to #60584, #50047.
Cockroach has long operated under the assumption that each node has a handle to a cluster-unique, integer identifier. This identifier, originally the
NodeID
, was assigned in the joining protocol. This ID was stored durably on the node's disk and was generated using a distributed transaction.With the advent of multi-tenant, we carved out a new abstraction, the
SQLInstanceID
to represent a similar concept but to encapsulate in the type system that the integer does not carry the same durable semantics. We've been able to make this work by using the node ID as the SQLInstanceID on a system tenant server by just using a hard-coded constant.cockroach/pkg/server/testserver.go
Line 583 in 8905b9b
These unique IDs get used for the following things:
unique_rowid()
uniquenessNodeID
only today, hence this projectIn summary, what we want is a subsystem that provides a mechanism for SQL pods to lease a tenant-unique
SQLInstanceID
and to associate a network address with this ID.Describe the solution you'd like
In 20.2 we introduced the
sqlliveness
subsystem as a mechanism to provide a distributed leasing primitive akin to node-liveness epochs used by the KV layer. These leases power job leases since 20.2 and in 21.1 the sql schema team will be working to adopt this subsystem for schema leases.Today, a server will create a new session whenever its session fails. This is akin to a kv node having its epoch pushed but not restarting as a process. This solution proposes that we change this behavior for SQL servers to instead crash upon (prior to) the expiration of its session. This is more acceptable because of the stateless nature of SQL pods.
Sketch of the steps:
Readers can read from the above subsystem with a scan and a filter on
crdb_internal.sql_liveness_is_alive
which is cheap. Also, something about cleanup of removed sessions, but I think I'd push that into the leasing of a new session.Straw man schema
Inserts query:
Describe alternatives you've considered
We could involve
SQLInstanceID
in thesqlliveness
subsystem directly. There's something appealing about that; however,Epic CRDB-2576
The text was updated successfully, but these errors were encountered: