Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: replacement for NodeID #48009

Closed
tbg opened this issue Apr 24, 2020 · 4 comments
Closed

sql: replacement for NodeID #48009

tbg opened this issue Apr 24, 2020 · 4 comments
Assignees
Labels
A-multitenancy Related to multi-tenancy

Comments

@tbg
Copy link
Member

tbg commented Apr 24, 2020

See here for background

SQL uses the NodeID in dozens of places, but it won't be available to SQL tenants since they're not running on a Node, and since we can't (don't want to) hand out new cluster-wide NodeIDs for each tenant process. We can use a bogus ID, but we need to be careful about semantics and so I would rather audit and replace it with something that is well-documented so that incorrect usage is avoided.

Uses of NodeID generally fall into a few categories:

The items in the first category will essentially not work under multi-tenants and will be disabled for Phase 2.
The third category is out of scope for Phase 2 but we're aware of there being a real technical problem to solve (#48008).

The second category needs a tenant instance ID instead of a NodeID. The simplest implementation here could be allocating these IDs off a per-tenant counter, similar to the table descriptor ID generator. Once we are starting tenant SQL servers with some optional persistence, we can persist the tenant instance ID locally and allocate fewer of them; however an int64 per tenant will last us over all of the pod restarts that we will ever need.

The strawman plan I have here is to made the NodeID optional with a deprecation process similar to #47972 (we expect stuff in category 1 to stick around for some time since most of it will be disabled in phase 2) and to introduce the tenant instance ID which will use the NodeID when it is available.

@nvanbenschoten and @asubiotto, is "(tenant) instance ID" a name that resonates? I'm avoiding "gateway ID" since a "gateway" in our definition is the SQL server serving a given transaction, and that does not matter here. Also, would you do something differently from what I've proposed here?

Also cc @ajwerner since jobs is affected

@tbg tbg added A-multitenancy Related to multi-tenancy A-multitenancy-blocker labels Apr 24, 2020
@ajwerner
Copy link
Contributor

This SGTM. In the short term should we just shove the tenant ID into the node ID spot and set a high order bit in it to make it clear that it's not a real node ID? Is that too hacky for our tastes?

@tbg
Copy link
Member Author

tbg commented Apr 28, 2020

At the time of writing, a NodeID of 9999 is hard-coded to tide us over. But one of my next PRs will replace it with an abstraction. Roughly you'll get a

interface {
  DeprecatedNodeID() roachpb.NodeID
  OptionalNodeID() (roachpb.NodeID, bool)
  TenantInstanceID() something.TenantInstanceID
}

The goal is then to remove all DeprecatedNodeID() calls. OptionalNodeID() sticks around for something like crdb_internal.node_metrics which has the nodeID as a column (or so I think, anyway, you get the idea - the system tenant probably has legit uses for the NodeID).

@nvanbenschoten
Copy link
Member

is "(tenant) instance ID" a name that resonates?

I'm fine with either "tenant instance ID" or "tenant process ID".

Also, would you do something differently from what I've proposed here?

No, I think what you laid out makes sense. Items in category 1 will be disabled initially (other than #47892, which is tracked separately). Items in category 2 can use some other form of a per-tenant unique identifier. Items in category 3 can be disabled indefinitely.

an int64 per tenant

The only question I'm left with is whether we want to make this tenant instance ID globally unique or only unique per tenant. I suspect that the former would actually be easier to implement and will make things a little easier to debug. The only downsides I can see to that are:

  1. faster ID usage - but we're still never going to overflow an int64.
  2. a single global hotspot for ID allocation - can be addressed by some block allocation of IDs if this ever becomes a problem.

@tbg tbg assigned nvanbenschoten and tbg and unassigned nvanbenschoten Apr 29, 2020
tbg added a commit to tbg/cockroach that referenced this issue Apr 29, 2020
With the advent of multi-tenancy, SQL servers can optionally be
decoupled from the (and share a) KV backend. As a consequence, the
NodeID may not be available and its use across all components that need
to function in a multi-tenant setup is deprecated. Yet, some components
require a similar unique identifier for the SQL server they're a part
of.

This new identifier is the SQLInstanceID. At the time of writing, since
we're not yet able to start true SQL tenant servers and only exercise
parts of this functionality in unit testing, it is a randomized integer.
It's possible that we'll replace it with a per-tenant or even global
counter in production, putting it truly on par with the NodeID, to be
able to give the simple guarantee that these IDs are never reused
between different incarnations of the SQL process for a given tenant.
This final decision is kicked down the road, see cockroachdb#48009.

This commit introduces an IDContainer which will aid this process by
wrapping both the NodeID and SQLInstanceID, very similar to the work
carried out for Gossip in cockroachdb#47972. We do not yet audit all of the call
sites to keep the initial diff container. Instead, IDContainer has a
Get() method that makes it a drop-in replacement for the previously used
NodeIDContainer. `Get ` will be removed in a future commit while moving
all callers to either the `Optional{,Err}` or `DeprecatedNodeID`
methods.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue May 1, 2020
With the advent of multi-tenancy, SQL servers can optionally be
decoupled from the (and share a) KV backend. As a consequence, the
NodeID may not be available and its use across all components that need
to function in a multi-tenant setup is deprecated. Yet, some components
require a similar unique identifier for the SQL server they're a part
of.

This new identifier is the SQLInstanceID. At the time of writing, since
we're not yet able to start true SQL tenant servers and only exercise
parts of this functionality in unit testing, it is a randomized integer.
It's possible that we'll replace it with a per-tenant or even global
counter in production, putting it truly on par with the NodeID, to be
able to give the simple guarantee that these IDs are never reused
between different incarnations of the SQL process for a given tenant.
This final decision is kicked down the road, see cockroachdb#48009.

This commit introduces an IDContainer which will aid this process by
wrapping both the NodeID and SQLInstanceID, very similar to the work
carried out for Gossip in cockroachdb#47972. We do not yet audit all of the call
sites to keep the initial diff container. Instead, IDContainer has a
Get() method that makes it a drop-in replacement for the previously used
NodeIDContainer. `Get ` will be removed in a future commit while moving
all callers to either the `Optional{,Err}` or `DeprecatedNodeID`
methods.

Release note: None
@tbg
Copy link
Member Author

tbg commented Jul 8, 2020

We use the SQL Instance ID where needed and various cleanup issues are filed, so closing.

@tbg tbg closed this as completed Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

No branches or pull requests

3 participants