server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

ajwerner · 2021-03-30T18:37:04Z

This is an epic (meta-issue)

Sub-issues:

server: ensure SQL pods have unique instance IDs within the tenant server: ensure SQL pods have unique instance IDs within the tenant #64474
server: store instance ID and network addresses of SQL pods server: store instance ID and network addresses of SQL pods #64475
server: supplement (*statusServer).iterateNodes() with a new iterateSQLPods() server: supplement (*statusServer).iterateNodes() with a new iterateSQLPods() #64477
rpc,security: set up TLS parameters for pod-to-pod communication rpc,security: set up TLS parameters for pod-to-pod communication #64476

Is your feature request related to a problem? Please describe.

Cockroach has long operated under the assumption that each node has a handle to a cluster-unique, integer identifier. This identifier, originally the NodeID, was assigned in the joining protocol. This ID was stored durably on the node's disk and was generated using a distributed transaction.

With the advent of multi-tenant, we carved out a new abstraction, the SQLInstanceID to represent a similar concept but to encapsulate in the type system that the integer does not carry the same durable semantics. We've been able to make this work by using the node ID as the SQLInstanceID on a system tenant server by just using a hard-coded constant.

cockroach/pkg/server/testserver.go

Line 583 in 8905b9b

const sqlInstanceID = base.SQLInstanceID(10001)

.

These unique IDs get used for the following things:

unique_rowid() uniqueness
RPC routing
- The node-dialer abstraction and all RPCs today assume that we use integers to address machines
- This mapping was propagated via gossip
- Note that this ability to fan-out to SQL processes powers many observability
- This is NodeID only today, hence this project
SQL descriptor leasing
- In 21.2 we'd like to replace this with sqlliveness session-based leases.
Others?

In summary, what we want is a subsystem that provides a mechanism for SQL pods to lease a tenant-unique SQLInstanceID and to associate a network address with this ID.

Describe the solution you'd like

In 20.2 we introduced the sqlliveness subsystem as a mechanism to provide a distributed leasing primitive akin to node-liveness epochs used by the KV layer. These leases power job leases since 20.2 and in 21.1 the sql schema team will be working to adopt this subsystem for schema leases.

Today, a server will create a new session whenever its session fails. This is akin to a kv node having its epoch pushed but not restarting as a process. This solution proposes that we change this behavior for SQL servers to instead crash upon (prior to) the expiration of its session. This is more acceptable because of the stateless nature of SQL pods.

Sketch of the steps:

Add a channel to sqlliveness.Session which closes when you expire
- Probably a bit more conservative than this even. Like if you fail to renew before 2x clock offset or something
Add a new subsystem for sql servers that exposes a sql instance ID (on real kv nodes this will be the node ID)
- Implement this thing by taking a sqlliveness.Session
- Also take a network address
- Add a backing system table
- Find an ID and lease it
- Literally crash the server when you receive on the expiration channel
Start the above subsystem in the SQL server startup

Readers can read from the above subsystem with a scan and a filter on crdb_internal.sql_liveness_is_alive which is cheap. Also, something about cleanup of removed sessions, but I think I'd push that into the leasing of a new session.

Straw man schema

CREATE TABLE sql_instances (
   id INT PRIMARY KEY,
   addr STRING NOT NULL,
   -- maybe something with locality addrs 
   -- maybe other metadata
   session_id BYTES NOT NULL
);

Inserts query:

WITH instances AS (SELECT * FROM sql_instances FOR UPDATE),
          deleted AS (
                    DELETE FROM sql_instances
                          WHERE NOT crdb_internal.sql_liveness_is_alive(session_id)
                      RETURNING id
                  ),
          id AS (
                SELECT id
                  FROM (
                         SELECT id FROM deleted
                         UNION ALL SELECT id
                                     FROM (
                                            SELECT generate_series(1, existing + 1, 1)
                                              FROM (SELECT count(id) AS existing FROM instances)
                                          )
                                    WHERE id NOT IN (SELECT id FROM instances)
                       )
                 LIMIT 1
             )
   INSERT
     INTO sql_instances
   SELECT id, $1 AS addr, $2 AS session_id
     FROM id
RETURNING id;

Describe alternatives you've considered

We could involve SQLInstanceID in the sqlliveness subsystem directly. There's something appealing about that; however,

Epic CRDB-2576

The text was updated successfully, but these errors were encountered:

ajwerner · 2021-03-30T18:38:01Z

@knz here's a first take on the SQLInstanceID and basis for pod2pod communication. It deserves some close reading and iteration.

knz · 2021-04-30T15:02:46Z

oh sorry I hadn't seen that issue. thanks

knz · 2021-04-30T15:06:03Z

So this is the overarching issue, which we have decided to split into 3 sub-projects:

server: ensure SQL pods have unique instance IDs within the tenant server: ensure SQL pods have unique instance IDs within the tenant #64474
server: store instance ID and network addresses of SQL pods server: store instance ID and network addresses of SQL pods #64475
iterator:
- server: supplement (*statusServer).iterateNodes() with a new iterateSQLPods() server: supplement (*statusServer).iterateNodes() with a new iterateSQLPods() #64477
- rpc,security: set up TLS parameters for pod-to-pod communication rpc,security: set up TLS parameters for pod-to-pod communication #64476

66600: sql: Store instance ID and network addresses of SQL pods r=rimadeodhar a=rimadeodhar This change adds a new system table, sql_instances, for storing SQL instance information in a multi-tenant environment. This table will be used to store and propagate unique instance ids and network address information on individual SQL pods. This change also implements a storage layer for accessing the sql_instances system table using the KV API. We need to use KV API as this table will be accessed and updated prior to the construction of a SQL server for the tenant. It also implements the provider interface for using this new sqlinstance subsystem to assign unique instance ids per pod and propagate their network information. This is a part of #62818, resolves #64475 Release Note (general change): Add sqlinstance subsystem for creating unique instance ids per SQL pod and for propagating network address for SQL pods within a single tenant. 67281: clusterversion: set `binaryMinSupportedVersion` to `v21_1` r=ajwerner a=ajwerner This probably should have been done when `Start21_2` was added. Fixes #67277. Release note: None 68280: SQL: Reset database zone configuration builtin r=arulajmani a=ajstorm The new(ish) multi-region syntax introduced in 21.1 sets a zone configuration at the database level. This zone configuration can diverge from its original state either because the user has manually overridden it, or because the system hasn't kept it up to date as the database has evolved (though there are no known issues where this occurs). In either case, the user may wish to revert the zone configurations for a given databases back to its original state. The new builtin takes a database ID, and resets its underlying zone configuration to its original state (as specified by the underlying RegionConfig). For non-multi-region databases, the builtin no-ops. Resolves #66855. Release note (sql change): Creates a builtin to reset the zone configuration of a multi-region database. This builtin can be helpful in cases where the user has overridden the zone configuration for a given database and wishes to revert back to the original system-specified state. 68400: ci: enlarge boot disks for `roachtest` clusters from 10GiB->32GiB r=jlinder a=rickystewart These tests are sometimes failing due to running out of storage space on the `roachprod` clusters used by `roachtest` (see #68166). Try to resolve it by bumping up the size of the boot disks. Release note: None Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Adam Storm <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

knz · 2021-09-17T08:49:41Z

Completed.

ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 30, 2021

knz added the T-server-and-security DB Server & Security label Mar 30, 2021

ajwerner mentioned this issue Apr 30, 2021

server: store instance ID and network addresses of SQL pods #64475

Closed

knz added the A-multitenancy Related to multi-tenancy label Apr 30, 2021

ajwerner mentioned this issue Jun 17, 2021

server,migrations: ensure that no old version can serve traffic #66606

Closed

rimadeodhar mentioned this issue Jun 25, 2021

sql: Store instance ID and network addresses of SQL pods #66600

Merged

knz added A-logging In and around the logging infrastructure. A-cc-enablement Pertains to current CC production issues or short-term projects labels Jul 29, 2021

knz closed this as completed Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

ajwerner commented Mar 30, 2021 •

edited by knz

Loading

ajwerner commented Mar 30, 2021

knz commented Apr 30, 2021

knz commented Apr 30, 2021

knz commented Sep 17, 2021

server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

Comments

ajwerner commented Mar 30, 2021 • edited by knz Loading

Sketch of the steps:

Straw man schema

ajwerner commented Mar 30, 2021

knz commented Apr 30, 2021

knz commented Apr 30, 2021

knz commented Sep 17, 2021

ajwerner commented Mar 30, 2021 •

edited by knz

Loading