-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: DistSender for SQL tenants #47909
Comments
@nvanbenschoten and I discussed this a bit yesterday. Nathan made the good point that by moving DistSender into the KV layer, there is an extra (AZ-local) hop in every request. This doesn't matter much for throughput (tenants are going to be rate limited anyway) but it does matter for latency. Roughly, with DistSender in the tenant, we get There were two reasons to originally want to move DistSender into KV: One was that we won't initially have a very good upgrade story around multi-tenancy (see #47919), and one argument for moving DistSender into KV (in multi-tenant deploys) was making this story simpler. The other argument was that DistSender has some pretty KV-ey dependencies: it wants the Gossip network (to translate NodeID->NodeAddress and Meta1Descriptor) and reads from the meta ranges, which are global cluster data (i.e. not per-tenant). Neither of these seems particularly compelling on its own. I have looked into the upgrade story a bit and while there may be a snag or two for some migrations, overall assuming we establish clear APIs between the tenant and KV (which we'll absolutely do) we should manage, given that we are for now the only operator of multi-tenant deploys. The dependencies are also dealt with relatively straightforwardly:
There are variations on this idea that create more homogeneity between single- and multi-tenant deployments. We can (and should) ensure that we use essentially the same mechanisms in both, i.e. have Lastly on top of the latency improvement there is another straightforward argument for leaving DistSender on the tenant, namely caching and resource consumption. On the tenant, the DistSender cache is going to cache exclusively the tenant's range descriptors, whereas in the KV layer they would all get mixed up. Besides, DistSender is a complex piece of code with many moving parts. It is better if any bug or resource exhaustion occurs on the SQL tenant. And, for another important one: DistSQL relies on leaseholder locations and more generally knowledge of ranges for setting up flows, which we eventually want it to do across multiple SQL containers for the same tenant. Moving DistSender into KV will make this harder. All things considered, it now seems preferable to leave DistSender in the tenant and abstract away its KV dependencies behind RPCs instead. I expect @nvanbenschoten to look into the details soon at which point we'll have a more concrete plan of action. |
Something I forgot to add: we were also considering whether we should stop using gossip to translate id to address altogether (in the comment above, the The idea here would be to have a "sign-in table" in which nodes post their address when they start (this is already in the store status, I think, but with a lot of junk we don't want to stream out, so it may have to be a new table). Every node would then have a RangeFeed on that table. My feeling is that the id->ip mapping is one of the few very legitimate uses of Gossip, so I'm not really interested in making that change. However, it is a future change we could make if we wanted to reduce our dependency on Gossip. On the other hand, @andreimatei is looking into improved DistSender range descriptor/leaseholder cache coherence by broadcasting updates after splits/lease transfers, for which the rangefeed solution falls flat. So I think there's an additional reason to stick for Gossip for now (and probably ever). |
We were previously using the Gossip instance of the TestServer against which the tenant was initialized. This commit trims the dependency further by initializing its own Gossip instance which is never written to (i.e. `AddInfo` is never called) and which does not accept incoming connections. As a reminder, the remaining problematic uses of Gossip as of this commit are: - making a `nodeDialer` (for `DistSender`), tracked in: cockroachdb#47909 - access to the system config: - `(schemaChangeGCResumer).Resume`, tracked: cockroachdb#49691 - `database.Cache`, tracked: cockroachdb#49692 - `(physicalplan).spanResolver` (for replica oracle). This is likely not a blocker as we can use a "dumber" oracle in this case; the oracle is used for distsql physical planning of which tenants will do none. Tracked in: cockroachdb#48432 Release note: None
We were previously using the Gossip instance of the TestServer against which the tenant was initialized. This commit trims the dependency further by initializing its own Gossip instance which is never written to (i.e. `AddInfo` is never called) and which does not accept incoming connections. As a reminder, the remaining problematic uses of Gossip as of this commit are: - making a `nodeDialer` (for `DistSender`), tracked in: cockroachdb#47909 - access to the system config: - `(schemaChangeGCResumer).Resume`, tracked: cockroachdb#49691 - `database.Cache`, tracked: cockroachdb#49692 - `(physicalplan).spanResolver` (for replica oracle). This is likely not a blocker as we can use a "dumber" oracle in this case; the oracle is used for distsql physical planning of which tenants will do none. Tracked in: cockroachdb#48432 Release note: None
We were previously using the Gossip instance of the TestServer against which the tenant was initialized. This commit trims the dependency further by initializing its own Gossip instance which is never written to (i.e. `AddInfo` is never called) and which does not accept incoming connections. As a reminder, the remaining problematic uses of Gossip as of this commit are: - making a `nodeDialer` (for `DistSender`), tracked in: cockroachdb#47909 - access to the system config: - `(schemaChangeGCResumer).Resume`, tracked: cockroachdb#49691 - `database.Cache`, tracked: cockroachdb#49692 - `(physicalplan).spanResolver` (for replica oracle). This is likely not a blocker as we can use a "dumber" oracle in this case; the oracle is used for distsql physical planning of which tenants will do none. Tracked in: cockroachdb#48432 Release note: None
49693: server: use "read-only" Gossip for tenants r=asubiotto,nvanbenschoten a=tbg We were previously using the Gossip instance of the TestServer against which the tenant was initialized. This commit trims the dependency further by initializing its own Gossip instance which is never written to (i.e. `AddInfo` is never called) and which does not accept incoming connections. As a reminder, the remaining problematic uses of Gossip as of this commit are: - making a `nodeDialer` (for `DistSender`), tracked in: #47909 - access to the system config: - `(schemaChangeGCResumer).Resume`, tracked: #49691 - `database.Cache`, tracked: #49692 - `(physicalplan).spanResolver` (for replica oracle). This is likely not a blocker as we can use a "dumber" oracle in this case; the oracle is used for distsql physical planning of which tenants will do none. Tracked in: #48432 cc @ajwerner Release note: None 49724: sql: clean up of scanNode and some other things r=yuzefovich a=yuzefovich **sql: unify PlanningCtx constructors into one** Release note: None **sql: remove separate scanVisilibity struct** This commit removes `sql.scanVisibility` in favor of protobuf-generated `execinfrapb.ScanVisibility`. It also introduces prettier aliases for the two values into `execinfra` package that are now used throughout the code. Release note: None **sql: clean up of scan node and a few other things** This commit does the following cleanups of `scanNode`: 1. refactors `scanNode.initCols` method to be standalone (it will probably be reused later by distsql spec exec factory). 2. removes `numBackfillColumns`, `specifiedIndexReverse`, and `isSecondaryIndex` fields since they are no longer used. 3. refactors the code to remove `valNeededForCols` field which was always consecutive numbers in range `[0, len(n.cols)-1]`. 4. refactors `getIndexIdx` method to not depend on `scanNode`. Additionally, this commit removes `planDependencies` business from `planTop` since optimizer now handles CREATE VIEW and handles the plan dependencies on its own (and CREATE VIEW was the single user of that struct in the plan top). Also, it removes (which seems like) unnecessary call to `planColumns` when creating distinct spec and an unused parameter from `createTableReaders` method. Addresses: #47474. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
This commit decomposes the roles of Gossip as a dependency in DistSender into that of a NodeDescStore, a source of node descriptors, and that of a FirstRangeProvider, a provider of information on the first range in a cluster. This decomposition will be used to address cockroachdb#47909 by: 1. replacing Gossip with a TenantService as a NodeDescStore 2. providing a custom RangeDescriptorDB (also backed by a TenantService) instead of a FirstRangeProvider. Together, these changes will allow us to remove DistSender's dependency on Gossip for SQL-only tenant processes. The next step after this will be to introduce a TenantService that can satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and also use the new NodeDescStore-to-AddressResolver binding to replace the use of Gossip with the TenantService in nodedialer instances.
This commit decomposes the roles of Gossip as a dependency in DistSender into that of a NodeDescStore, a source of node descriptors, and that of a FirstRangeProvider, a provider of information on the first range in a cluster. This decomposition will be used to address cockroachdb#47909 by: 1. replacing Gossip with a TenantService as a NodeDescStore 2. providing a custom RangeDescriptorDB (also backed by a TenantService) instead of a FirstRangeProvider. Together, these changes will allow us to remove DistSender's dependency on Gossip for SQL-only tenant processes. The next step after this will be to introduce a TenantService that can satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and also use the new NodeDescStore-to-AddressResolver binding to replace the use of Gossip with the TenantService in nodedialer instances.
50116: kv: decompose roles of Gossip dependency in DistSender r=nvanbenschoten a=nvanbenschoten This change decomposes the roles of Gossip as a dependency in DistSender into that of a NodeDescStore, a source of node descriptors, and that of a FirstRangeProvider, a provider of information on the first range in a cluster. This decomposition will be used to address #47909 by: 1. replacing Gossip with a TenantService as a NodeDescStore 2. providing a custom RangeDescriptorDB (also backed by a TenantService) instead of a FirstRangeProvider. Together, these changes will allow us to remove DistSender's dependency on Gossip for SQL-only tenant processes. The next step after this will be to introduce a TenantService that can satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and also use the new NodeDescStore-to-AddressResolver binding to replace the use of Gossip with the TenantService in nodedialer instances. 50293: server: Wrap ExternalStorage factory methods in externalStorageBuilder struct r=adityamaru a=adityamaru Previously, we initialized the ExternalStorage factory methods on creation of a NewServer() as all the required config params were ready-to-use. With future work related to user scoped storage requiring access to the underlying storage.Engine, this change introduces a wrapper around these factory methods. Using a builder struct allows us to split the "creation" and "initialization" of the builder between the NewServer() and Start() methods respectively. This allows for params which are only initialized on server.Start() to be propogated to the builder for future use. This is part of a gradual refactor of the ExternalStorage factory interface and is primarily to unblock development of #47211. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Aditya Maru <[email protected]>
Fixes cockroachdb#47909. This commit starts by adding two RPCs to the Internal service: ``` service Internal { ... rpc RangeLookup (RangeLookupRequest) returns (RangeLookupResponse) {} rpc NodeInfo (NodeInfoRequest) returns (stream NodeInfoResponse) {} } // RangeLookupRequest is a request to proxy a RangeLookup through a Tenant // service. Its fields correspond to a subset of the args of kv.RangeLookup. message RangeLookupRequest { ... } // NodeInfoRequest is a request to establish an indefinite stream on a Tenant // service that provides an initial NodeInfoResponse and a NodeInfoResponse // whenever the collection of KV nodes in a cluster changes. It effectively // proxies any updates to NodeDescriptors in the KV gossip network back to the // client of the request. message NodeInfoRequest {} ``` The commit then introduces new `kvtenant.Proxy` object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication. Proxy has two main roles: First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the `NodeDescStore` interface and can be used as an `AddressResolver` with a small adapter. Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes. With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably, with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start` removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and things "just work".
Fixes cockroachdb#47909. This commit starts by adding two RPCs to the Internal service: ``` service Internal { ... rpc RangeLookup (RangeLookupRequest) returns (RangeLookupResponse) {} rpc NodeInfo (NodeInfoRequest) returns (stream NodeInfoResponse) {} } // RangeLookupRequest is a request to proxy a RangeLookup through a Tenant // service. Its fields correspond to a subset of the args of kv.RangeLookup. message RangeLookupRequest { ... } // NodeInfoRequest is a request to establish an indefinite stream on a Tenant // service that provides an initial NodeInfoResponse and a NodeInfoResponse // whenever the collection of KV nodes in a cluster changes. It effectively // proxies any updates to NodeDescriptors in the KV gossip network back to the // client of the request. message NodeInfoRequest {} ``` The commit then introduces new `kvtenant.Proxy` object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication. Proxy has two main roles: First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the `NodeDescStore` interface and can be used as an `AddressResolver` with a small adapter. Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes. With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably, with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start` removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and things "just work".
Fixes cockroachdb#47909. This commit starts by adding two RPCs to the Internal service: ``` service Internal { ... rpc RangeFeed (RangeFeedRequest) returns (stream RangeFeedEvent) {} rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {} } // RangeLookupRequest is a request to proxy a RangeLookup through a Tenant // service. Its fields correspond to a subset of the args of kv.RangeLookup. message RangeLookupRequest { ... } // GossipSubscriptionRequest initiates a game of telephone. It establishes an // indefinite stream that proxies gossip information overheard by the recipient // node back to the caller. Gossip information is filtered down to just those // identified by a key matching any of the specified patterns. // // Upon establishment of the stream, all existing information that matches one // or more of the patterns is returned. After this point, only new information // matching the patterns is returned. message GossipSubscriptionRequest { ... } ``` The commit then introduces new `kvtenant.Proxy` object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication. Proxy has two main roles: First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the `NodeDescStore` interface and can be used as an `AddressResolver` with a small adapter. Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes. With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably, with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start` removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and things "just work".
50520: kv/kvclient: introduce new tenant Proxy r=nvanbenschoten a=nvanbenschoten Closes #47909. This commit starts by adding two RPCs to the Internal service: ```proto service Internal { ... rpc RangeFeed (RangeFeedRequest) returns (stream RangeFeedEvent) {} rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {} } // RangeLookupRequest is a request to proxy a RangeLookup through a Tenant // service. Its fields correspond to a subset of the args of kv.RangeLookup. message RangeLookupRequest { ... } // GossipSubscriptionRequest initiates a game of telephone. It establishes an // indefinite stream that proxies gossip information overheard by the recipient // node back to the caller. Gossip information is filtered down to just those // identified by a key matching any of the specified patterns. // // Upon establishment of the stream, all existing information that matches one // or more of the patterns is returned. After this point, only new information // matching the patterns is returned. message GossipSubscriptionRequest { ... } ``` The commit then introduces new `kvtenant.Proxy` object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication. Proxy has two main roles: First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the `NodeDescStore` interface and can be used as an `AddressResolver` with a small adapter. Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes. With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to #49692 and #47150 in SQL. Notably, with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start` removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and things "just work". Co-authored-by: Nathan VanBenschoten <[email protected]>
Background
DistSender needs changes to work on SQL tenants. The two approaches we've discussed are either giving it an endpoint to do range lookups (and changing a bunch of stuff on how it uses gossip and other systems that won't be available) OR pushing it into the KV layer for these deployments.
Abstractly the second option seems cleaner and easier, but it's also new territory. Either way, we need to pick one and do it.
The text was updated successfully, but these errors were encountered: