-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: replacement for NodeID #48009
Comments
This SGTM. In the short term should we just shove the tenant ID into the node ID spot and set a high order bit in it to make it clear that it's not a real node ID? Is that too hacky for our tastes? |
At the time of writing, a NodeID of 9999 is hard-coded to tide us over. But one of my next PRs will replace it with an abstraction. Roughly you'll get a interface {
DeprecatedNodeID() roachpb.NodeID
OptionalNodeID() (roachpb.NodeID, bool)
TenantInstanceID() something.TenantInstanceID
} The goal is then to remove all |
I'm fine with either "tenant instance ID" or "tenant process ID".
No, I think what you laid out makes sense. Items in category 1 will be disabled initially (other than #47892, which is tracked separately). Items in category 2 can use some other form of a per-tenant unique identifier. Items in category 3 can be disabled indefinitely.
The only question I'm left with is whether we want to make this tenant instance ID globally unique or only unique per tenant. I suspect that the former would actually be easier to implement and will make things a little easier to debug. The only downsides I can see to that are:
|
With the advent of multi-tenancy, SQL servers can optionally be decoupled from the (and share a) KV backend. As a consequence, the NodeID may not be available and its use across all components that need to function in a multi-tenant setup is deprecated. Yet, some components require a similar unique identifier for the SQL server they're a part of. This new identifier is the SQLInstanceID. At the time of writing, since we're not yet able to start true SQL tenant servers and only exercise parts of this functionality in unit testing, it is a randomized integer. It's possible that we'll replace it with a per-tenant or even global counter in production, putting it truly on par with the NodeID, to be able to give the simple guarantee that these IDs are never reused between different incarnations of the SQL process for a given tenant. This final decision is kicked down the road, see cockroachdb#48009. This commit introduces an IDContainer which will aid this process by wrapping both the NodeID and SQLInstanceID, very similar to the work carried out for Gossip in cockroachdb#47972. We do not yet audit all of the call sites to keep the initial diff container. Instead, IDContainer has a Get() method that makes it a drop-in replacement for the previously used NodeIDContainer. `Get ` will be removed in a future commit while moving all callers to either the `Optional{,Err}` or `DeprecatedNodeID` methods. Release note: None
With the advent of multi-tenancy, SQL servers can optionally be decoupled from the (and share a) KV backend. As a consequence, the NodeID may not be available and its use across all components that need to function in a multi-tenant setup is deprecated. Yet, some components require a similar unique identifier for the SQL server they're a part of. This new identifier is the SQLInstanceID. At the time of writing, since we're not yet able to start true SQL tenant servers and only exercise parts of this functionality in unit testing, it is a randomized integer. It's possible that we'll replace it with a per-tenant or even global counter in production, putting it truly on par with the NodeID, to be able to give the simple guarantee that these IDs are never reused between different incarnations of the SQL process for a given tenant. This final decision is kicked down the road, see cockroachdb#48009. This commit introduces an IDContainer which will aid this process by wrapping both the NodeID and SQLInstanceID, very similar to the work carried out for Gossip in cockroachdb#47972. We do not yet audit all of the call sites to keep the initial diff container. Instead, IDContainer has a Get() method that makes it a drop-in replacement for the previously used NodeIDContainer. `Get ` will be removed in a future commit while moving all callers to either the `Optional{,Err}` or `DeprecatedNodeID` methods. Release note: None
We use the SQL Instance ID where needed and various cleanup issues are filed, so closing. |
See here for background
SQL uses the NodeID in dozens of places, but it won't be available to SQL tenants since they're not running on a Node, and since we can't (don't want to) hand out new cluster-wide NodeIDs for each tenant process. We can use a bogus ID, but we need to be careful about semantics and so I would rather audit and replace it with something that is well-documented so that incorrect usage is avoided.
Uses of NodeID generally fall into a few categories:
node_id()
builtin (easily disabled on tenants)unique_rowid
The items in the first category will essentially not work under multi-tenants and will be disabled for Phase 2.
The third category is out of scope for Phase 2 but we're aware of there being a real technical problem to solve (#48008).
The second category needs a tenant instance ID instead of a NodeID. The simplest implementation here could be allocating these IDs off a per-tenant counter, similar to the table descriptor ID generator. Once we are starting tenant SQL servers with some optional persistence, we can persist the tenant instance ID locally and allocate fewer of them; however an int64 per tenant will last us over all of the pod restarts that we will ever need.
The strawman plan I have here is to made the NodeID optional with a deprecation process similar to #47972 (we expect stuff in category 1 to stick around for some time since most of it will be disabled in phase 2) and to introduce the tenant instance ID which will use the NodeID when it is available.
@nvanbenschoten and @asubiotto, is "(tenant) instance ID" a name that resonates? I'm avoiding "gateway ID" since a "gateway" in our definition is the SQL server serving a given transaction, and that does not matter here. Also, would you do something differently from what I've proposed here?
Also cc @ajwerner since jobs is affected
The text was updated successfully, but these errors were encountered: