Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,sql: subsystem to provide unique instance IDs and routing metadata for sql pods #62818

Closed
4 tasks done
ajwerner opened this issue Mar 30, 2021 · 4 comments
Closed
4 tasks done
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-logging In and around the logging infrastructure. A-multitenancy Related to multi-tenancy C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Mar 30, 2021

This is an epic (meta-issue)

Sub-issues:

Is your feature request related to a problem? Please describe.

Relates to #60584, #50047.

Cockroach has long operated under the assumption that each node has a handle to a cluster-unique, integer identifier. This identifier, originally the NodeID, was assigned in the joining protocol. This ID was stored durably on the node's disk and was generated using a distributed transaction.

With the advent of multi-tenant, we carved out a new abstraction, the SQLInstanceID to represent a similar concept but to encapsulate in the type system that the integer does not carry the same durable semantics. We've been able to make this work by using the node ID as the SQLInstanceID on a system tenant server by just using a hard-coded constant.

const sqlInstanceID = base.SQLInstanceID(10001)
.

These unique IDs get used for the following things:

  • unique_rowid() uniqueness
  • RPC routing
    • The node-dialer abstraction and all RPCs today assume that we use integers to address machines
    • This mapping was propagated via gossip
    • Note that this ability to fan-out to SQL processes powers many observability
    • This is NodeID only today, hence this project
  • SQL descriptor leasing
    • In 21.2 we'd like to replace this with sqlliveness session-based leases.
  • Others?

In summary, what we want is a subsystem that provides a mechanism for SQL pods to lease a tenant-unique SQLInstanceID and to associate a network address with this ID.

Describe the solution you'd like

In 20.2 we introduced the sqlliveness subsystem as a mechanism to provide a distributed leasing primitive akin to node-liveness epochs used by the KV layer. These leases power job leases since 20.2 and in 21.1 the sql schema team will be working to adopt this subsystem for schema leases.

Today, a server will create a new session whenever its session fails. This is akin to a kv node having its epoch pushed but not restarting as a process. This solution proposes that we change this behavior for SQL servers to instead crash upon (prior to) the expiration of its session. This is more acceptable because of the stateless nature of SQL pods.

Sketch of the steps:
  • Add a channel to sqlliveness.Session which closes when you expire
    • Probably a bit more conservative than this even. Like if you fail to renew before 2x clock offset or something
  • Add a new subsystem for sql servers that exposes a sql instance ID (on real kv nodes this will be the node ID)
    • Implement this thing by taking a sqlliveness.Session
    • Also take a network address
    • Add a backing system table
    • Find an ID and lease it
    • Literally crash the server when you receive on the expiration channel
  • Start the above subsystem in the SQL server startup

Readers can read from the above subsystem with a scan and a filter on crdb_internal.sql_liveness_is_alive which is cheap. Also, something about cleanup of removed sessions, but I think I'd push that into the leasing of a new session.

Straw man schema
CREATE TABLE sql_instances (
   id INT PRIMARY KEY,
   addr STRING NOT NULL,
   -- maybe something with locality addrs 
   -- maybe other metadata
   session_id BYTES NOT NULL
);

Inserts query:

WITH instances AS (SELECT * FROM sql_instances FOR UPDATE),
          deleted AS (
                    DELETE FROM sql_instances
                          WHERE NOT crdb_internal.sql_liveness_is_alive(session_id)
                      RETURNING id
                  ),
          id AS (
                SELECT id
                  FROM (
                         SELECT id FROM deleted
                         UNION ALL SELECT id
                                     FROM (
                                            SELECT generate_series(1, existing + 1, 1)
                                              FROM (SELECT count(id) AS existing FROM instances)
                                          )
                                    WHERE id NOT IN (SELECT id FROM instances)
                       )
                 LIMIT 1
             )
   INSERT
     INTO sql_instances
   SELECT id, $1 AS addr, $2 AS session_id
     FROM id
RETURNING id;

Describe alternatives you've considered

We could involve SQLInstanceID in the sqlliveness subsystem directly. There's something appealing about that; however,

Epic CRDB-2576

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 30, 2021
@ajwerner
Copy link
Contributor Author

@knz here's a first take on the SQLInstanceID and basis for pod2pod communication. It deserves some close reading and iteration.

@knz
Copy link
Contributor

knz commented Apr 30, 2021

oh sorry I hadn't seen that issue. thanks

@knz
Copy link
Contributor

knz commented Apr 30, 2021

So this is the overarching issue, which we have decided to split into 3 sub-projects:

  1. server: ensure SQL pods have unique instance IDs within the tenant server: ensure SQL pods have unique instance IDs within the tenant #64474

  2. server: store instance ID and network addresses of SQL pods server: store instance ID and network addresses of SQL pods #64475

  3. iterator:

@knz knz added the A-multitenancy Related to multi-tenancy label Apr 30, 2021
@knz knz added A-logging In and around the logging infrastructure. A-cc-enablement Pertains to current CC production issues or short-term projects labels Jul 29, 2021
craig bot pushed a commit that referenced this issue Aug 3, 2021
66600: sql: Store instance ID and network addresses of SQL pods r=rimadeodhar a=rimadeodhar

This change adds a new system table, sql_instances, 
for storing SQL instance information in a multi-tenant 
environment. This table will be used to store and 
propagate unique instance ids and network address 
information on individual SQL pods.

This change also implements a storage layer for 
accessing the sql_instances system table using 
the KV API. We need to use KV API as this table 
will be accessed and updated prior to the construction
of a SQL server for the tenant. It also implements the 
provider interface for using this new sqlinstance 
subsystem to assign unique instance ids per pod and 
propagate their network information. 

This is a part of #62818, 
resolves #64475

Release Note (general change): Add sqlinstance subsystem for creating 
unique instance ids per SQL pod and for propagating network address 
for SQL pods within a single tenant. 

67281: clusterversion: set `binaryMinSupportedVersion` to `v21_1` r=ajwerner a=ajwerner

This probably should have been done when `Start21_2` was added.

Fixes #67277.

Release note: None

68280: SQL: Reset database zone configuration builtin r=arulajmani a=ajstorm

The new(ish) multi-region syntax introduced in 21.1 sets a zone
configuration at the database level. This zone configuration
can diverge from its original state either because the user has
manually overridden it, or because the system hasn't kept it up
to date as the database has evolved (though there are no known
issues where this occurs). In either case, the user may wish to
revert the zone configurations for a given databases back to its
original state.

The new builtin takes a database ID, and resets its underlying zone
configuration to its original state (as specified by the underlying
RegionConfig). For non-multi-region databases, the builtin no-ops.

Resolves #66855.

Release note (sql change): Creates a builtin to reset the zone configuration
of a multi-region database. This builtin can be helpful in
cases where the user has overridden the zone configuration for a given
database and wishes to revert back to the original system-specified
state.

68400: ci: enlarge boot disks for `roachtest` clusters from 10GiB->32GiB r=jlinder a=rickystewart

These tests are sometimes failing due to running out of storage space on
the `roachprod` clusters used by `roachtest` (see #68166). Try to
resolve it by bumping up the size of the boot disks.

Release note: None

Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Adam Storm <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@knz
Copy link
Contributor

knz commented Sep 17, 2021

Completed.

@knz knz closed this as completed Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-logging In and around the logging infrastructure. A-multitenancy Related to multi-tenancy C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security
Projects
None yet
Development

No branches or pull requests

2 participants