Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: DistSender for SQL tenants #47909

Closed
tbg opened this issue Apr 22, 2020 · 2 comments · Fixed by #50520
Closed

kv: DistSender for SQL tenants #47909

tbg opened this issue Apr 22, 2020 · 2 comments · Fixed by #50520
Assignees
Labels
A-multitenancy Related to multi-tenancy

Comments

@tbg
Copy link
Member

tbg commented Apr 22, 2020

Background

DistSender needs changes to work on SQL tenants. The two approaches we've discussed are either giving it an endpoint to do range lookups (and changing a bunch of stuff on how it uses gossip and other systems that won't be available) OR pushing it into the KV layer for these deployments.

Abstractly the second option seems cleaner and easier, but it's also new territory. Either way, we need to pick one and do it.

@tbg
Copy link
Member Author

tbg commented May 8, 2020

@nvanbenschoten and I discussed this a bit yesterday.

Nathan made the good point that by moving DistSender into the KV layer, there is an extra (AZ-local) hop in every request. This doesn't matter much for throughput (tenants are going to be rate limited anyway) but it does matter for latency. Roughly, with DistSender in the tenant, we get latency(client,tenant)+latency(tenant,leaseholder). With DistSender in the KV layer, we get latency(client,tenant)+latency(tenant,nodeA)+latency(tenant,leaseholder) which is expected to be approximately latency(client,tenant)+2*latency(tenant,leaseholder). It's possibly not a huge deal now, but it's not pretty and could become important down the road.

There were two reasons to originally want to move DistSender into KV:

One was that we won't initially have a very good upgrade story around multi-tenancy (see #47919), and one argument for moving DistSender into KV (in multi-tenant deploys) was making this story simpler.

The other argument was that DistSender has some pretty KV-ey dependencies: it wants the Gossip network (to translate NodeID->NodeAddress and Meta1Descriptor) and reads from the meta ranges, which are global cluster data (i.e. not per-tenant).

Neither of these seems particularly compelling on its own. I have looked into the upgrade story a bit and while there may be a snag or two for some migrations, overall assuming we establish clear APIs between the tenant and KV (which we'll absolutely do) we should manage, given that we are for now the only operator of multi-tenant deploys.

The dependencies are also dealt with relatively straightforwardly:

  • add a TenantRangeLookup RPC to the KV tenant service which allows looking up range descriptors, but only those in the purview of the tenant
  • add a TenantNodeInfo endpoint which streams (NodeID,NodeAddress) pairs back to the tenant, essentially providing a RangeFeed of node addresses (with an update whenever an IP changes).
  • the tenant does not need meta1, since it can hit the TenantRangeLookup endpoint on any KV node, and we'll simply point it at a (AZ-preferring) round-robin.

There are variations on this idea that create more homogeneity between single- and multi-tenant deployments. We can (and should) ensure that we use essentially the same mechanisms in both, i.e. have DistSender always pull range descriptors from the specialized endpoint (which abstracts over the internal scan of the meta ranges that DistSender itself does today), and similarly for the id->ip mapping. This amounts to moving only the gossip and meta dependency into KV RPC.

Lastly on top of the latency improvement there is another straightforward argument for leaving DistSender on the tenant, namely caching and resource consumption. On the tenant, the DistSender cache is going to cache exclusively the tenant's range descriptors, whereas in the KV layer they would all get mixed up. Besides, DistSender is a complex piece of code with many moving parts. It is better if any bug or resource exhaustion occurs on the SQL tenant.

And, for another important one: DistSQL relies on leaseholder locations and more generally knowledge of ranges for setting up flows, which we eventually want it to do across multiple SQL containers for the same tenant. Moving DistSender into KV will make this harder.

All things considered, it now seems preferable to leave DistSender in the tenant and abstract away its KV dependencies behind RPCs instead.

I expect @nvanbenschoten to look into the details soon at which point we'll have a more concrete plan of action.

@tbg
Copy link
Member Author

tbg commented May 8, 2020

Something I forgot to add: we were also considering whether we should stop using gossip to translate id to address altogether (in the comment above, the TenantNodeInfo endpoint implementation would essentially just relay gossip callbacks to the tenant).

The idea here would be to have a "sign-in table" in which nodes post their address when they start (this is already in the store status, I think, but with a lot of junk we don't want to stream out, so it may have to be a new table). Every node would then have a RangeFeed on that table.

My feeling is that the id->ip mapping is one of the few very legitimate uses of Gossip, so I'm not really interested in making that change. However, it is a future change we could make if we wanted to reduce our dependency on Gossip. On the other hand, @andreimatei is looking into improved DistSender range descriptor/leaseholder cache coherence by broadcasting updates after splits/lease transfers, for which the rangefeed solution falls flat. So I think there's an additional reason to stick for Gossip for now (and probably ever).

tbg added a commit to tbg/cockroach that referenced this issue May 29, 2020
We were previously using the Gossip instance of the TestServer against
which the tenant was initialized.

This commit trims the dependency further by initializing its own Gossip
instance which is never written to (i.e. `AddInfo` is never called) and
which does not accept incoming connections.

As a reminder, the remaining problematic uses of Gossip as of this
commit are:

- making a `nodeDialer` (for `DistSender`), tracked in:
  cockroachdb#47909
- access to the system config:
  - `(schemaChangeGCResumer).Resume`, tracked:
    cockroachdb#49691
  - `database.Cache`, tracked:
    cockroachdb#49692
- `(physicalplan).spanResolver` (for replica oracle).
  This is likely not a blocker as we can use a "dumber" oracle in this case;
  the oracle is used for distsql physical planning of which tenants will
  do none. Tracked in: cockroachdb#48432

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jun 2, 2020
We were previously using the Gossip instance of the TestServer against
which the tenant was initialized.

This commit trims the dependency further by initializing its own Gossip
instance which is never written to (i.e. `AddInfo` is never called) and
which does not accept incoming connections.

As a reminder, the remaining problematic uses of Gossip as of this
commit are:

- making a `nodeDialer` (for `DistSender`), tracked in:
  cockroachdb#47909
- access to the system config:
  - `(schemaChangeGCResumer).Resume`, tracked:
    cockroachdb#49691
  - `database.Cache`, tracked:
    cockroachdb#49692
- `(physicalplan).spanResolver` (for replica oracle).
  This is likely not a blocker as we can use a "dumber" oracle in this case;
  the oracle is used for distsql physical planning of which tenants will
  do none. Tracked in: cockroachdb#48432

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jun 2, 2020
We were previously using the Gossip instance of the TestServer against
which the tenant was initialized.

This commit trims the dependency further by initializing its own Gossip
instance which is never written to (i.e. `AddInfo` is never called) and
which does not accept incoming connections.

As a reminder, the remaining problematic uses of Gossip as of this
commit are:

- making a `nodeDialer` (for `DistSender`), tracked in:
  cockroachdb#47909
- access to the system config:
  - `(schemaChangeGCResumer).Resume`, tracked:
    cockroachdb#49691
  - `database.Cache`, tracked:
    cockroachdb#49692
- `(physicalplan).spanResolver` (for replica oracle).
  This is likely not a blocker as we can use a "dumber" oracle in this case;
  the oracle is used for distsql physical planning of which tenants will
  do none. Tracked in: cockroachdb#48432

Release note: None
craig bot pushed a commit that referenced this issue Jun 2, 2020
49693: server: use "read-only" Gossip for tenants r=asubiotto,nvanbenschoten a=tbg

We were previously using the Gossip instance of the TestServer against
which the tenant was initialized.

This commit trims the dependency further by initializing its own Gossip
instance which is never written to (i.e. `AddInfo` is never called) and
which does not accept incoming connections.

As a reminder, the remaining problematic uses of Gossip as of this
commit are:

- making a `nodeDialer` (for `DistSender`), tracked in:
  #47909
- access to the system config:
  - `(schemaChangeGCResumer).Resume`, tracked:
    #49691
  - `database.Cache`, tracked:
    #49692
- `(physicalplan).spanResolver` (for replica oracle).
  This is likely not a blocker as we can use a "dumber" oracle in this case;
  the oracle is used for distsql physical planning of which tenants will
  do none. Tracked in: #48432

cc @ajwerner 

Release note: None

49724: sql: clean up of scanNode and some other things r=yuzefovich a=yuzefovich

**sql: unify PlanningCtx constructors into one**

Release note: None

**sql: remove separate scanVisilibity struct**

This commit removes `sql.scanVisibility` in favor of protobuf-generated
`execinfrapb.ScanVisibility`. It also introduces prettier aliases for
the two values into `execinfra` package that are now used throughout the
code.

Release note: None

**sql: clean up of scan node and a few other things**

This commit does the following cleanups of `scanNode`:
1. refactors `scanNode.initCols` method to be standalone (it will
probably be reused later by distsql spec exec factory).
2. removes `numBackfillColumns`, `specifiedIndexReverse`, and
`isSecondaryIndex` fields since they are no longer used.
3. refactors the code to remove `valNeededForCols` field which was
always consecutive numbers in range `[0, len(n.cols)-1]`.
4. refactors `getIndexIdx` method to not depend on `scanNode`.

Additionally, this commit removes `planDependencies` business
from `planTop` since optimizer now handles CREATE VIEW and handles
the plan dependencies on its own (and CREATE VIEW was the single
user of that struct in the plan top).

Also, it removes (which seems like) unnecessary call to `planColumns`
when creating distinct spec and an unused parameter from
`createTableReaders` method.

Addresses: #47474.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 11, 2020
This commit decomposes the roles of Gossip as a dependency in DistSender
into that of a NodeDescStore, a source of node descriptors, and that of
a FirstRangeProvider, a provider of information on the first range in a
cluster.

This decomposition will be used to address cockroachdb#47909 by:
1. replacing Gossip with a TenantService as a NodeDescStore
2. providing a custom RangeDescriptorDB (also backed by a TenantService)
   instead of a FirstRangeProvider.

Together, these changes will allow us to remove DistSender's dependency
on Gossip for SQL-only tenant processes.

The next step after this will be to introduce a TenantService that can
satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and
also use the new NodeDescStore-to-AddressResolver binding to replace the
use of Gossip with the TenantService in nodedialer instances.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 16, 2020
This commit decomposes the roles of Gossip as a dependency in DistSender
into that of a NodeDescStore, a source of node descriptors, and that of
a FirstRangeProvider, a provider of information on the first range in a
cluster.

This decomposition will be used to address cockroachdb#47909 by:
1. replacing Gossip with a TenantService as a NodeDescStore
2. providing a custom RangeDescriptorDB (also backed by a TenantService)
   instead of a FirstRangeProvider.

Together, these changes will allow us to remove DistSender's dependency
on Gossip for SQL-only tenant processes.

The next step after this will be to introduce a TenantService that can
satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and
also use the new NodeDescStore-to-AddressResolver binding to replace the
use of Gossip with the TenantService in nodedialer instances.
craig bot pushed a commit that referenced this issue Jun 18, 2020
50116: kv: decompose roles of Gossip dependency in DistSender r=nvanbenschoten a=nvanbenschoten

This change decomposes the roles of Gossip as a dependency in DistSender into that of a NodeDescStore, a source of node descriptors, and that of a FirstRangeProvider, a provider of information on the first range in a cluster.

This decomposition will be used to address #47909 by:
1. replacing Gossip with a TenantService as a NodeDescStore
2. providing a custom RangeDescriptorDB (also backed by a TenantService) instead of a FirstRangeProvider.

Together, these changes will allow us to remove DistSender's dependency on Gossip for SQL-only tenant processes.

The next step after this will be to introduce a TenantService that can satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and also use the new NodeDescStore-to-AddressResolver binding to replace the use of Gossip with the TenantService in nodedialer instances.

50293: server: Wrap ExternalStorage factory methods in externalStorageBuilder struct r=adityamaru a=adityamaru

Previously, we initialized the ExternalStorage factory methods on
creation of a NewServer() as all the required config params were
ready-to-use.

With future work related to user scoped storage requiring access to the
underlying storage.Engine, this change introduces a wrapper around these
factory methods. Using a builder struct allows us to split the
"creation" and "initialization" of the builder between the NewServer()
and Start() methods respectively. This allows for params which are only
initialized on server.Start() to be propogated to the builder for future
use.

This is part of a gradual refactor of the ExternalStorage factory
interface and is primarily to unblock development of #47211.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 23, 2020
Fixes cockroachdb#47909.

This commit starts by adding two RPCs to the Internal service:
```
service Internal {
	...
	rpc RangeLookup (RangeLookupRequest) returns (RangeLookupResponse)     {}
	rpc NodeInfo    (NodeInfoRequest)    returns (stream NodeInfoResponse) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// NodeInfoRequest is a request to establish an indefinite stream on a Tenant
// service that provides an initial NodeInfoResponse and a NodeInfoResponse
// whenever the collection of KV nodes in a cluster changes. It effectively
// proxies any updates to NodeDescriptors in the KV gossip network back to the
// client of the request.
message NodeInfoRequest {}
```

The commit then introduces new `kvtenant.Proxy` object. Proxy mediates
the communication of cluster-wide state to sandboxed SQL-only tenant
processes through a restricted interface. A Proxy is seeded with a set
of one or more network addresses that reference existing KV nodes in the
cluster (or a load-balancer which fans out to some/all KV nodes). On
startup, it establishes contact with one of these nodes to learn about
the topology of the cluster and bootstrap the rest of SQL <-> KV network
communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes
in the cluster in the form of NodeDescriptors. This obviates the need
for SQL-only tenant processes to join the cluster-wide gossip network.
In doing so, it satisfies the `NodeDescStore` interface and can be used
as an `AddressResolver` with a small adapter.

Second, Proxy is capable of providing Range addressing information in
the form of RangeDescriptors through delegated RangeLookup requests.
This is necessary because SQL-only tenants are restricted from reading
Range Metadata keys directly. Instead, the RangeLookup requests are
proxied through existing KV nodes while being subject to additional
validation (e.g. is the Range being requested owned by the requesting
tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface
and can be used to delegate all DistSender/RangeCache descriptor lookups
to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without
joining the KV cluster's gossip network. This works if I comment out a
few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably,
with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start`
removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and
things "just work".
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 2, 2020
Fixes cockroachdb#47909.

This commit starts by adding two RPCs to the Internal service:
```
service Internal {
	...
	rpc RangeLookup (RangeLookupRequest) returns (RangeLookupResponse)     {}
	rpc NodeInfo    (NodeInfoRequest)    returns (stream NodeInfoResponse) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// NodeInfoRequest is a request to establish an indefinite stream on a Tenant
// service that provides an initial NodeInfoResponse and a NodeInfoResponse
// whenever the collection of KV nodes in a cluster changes. It effectively
// proxies any updates to NodeDescriptors in the KV gossip network back to the
// client of the request.
message NodeInfoRequest {}
```

The commit then introduces new `kvtenant.Proxy` object. Proxy mediates
the communication of cluster-wide state to sandboxed SQL-only tenant
processes through a restricted interface. A Proxy is seeded with a set
of one or more network addresses that reference existing KV nodes in the
cluster (or a load-balancer which fans out to some/all KV nodes). On
startup, it establishes contact with one of these nodes to learn about
the topology of the cluster and bootstrap the rest of SQL <-> KV network
communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes
in the cluster in the form of NodeDescriptors. This obviates the need
for SQL-only tenant processes to join the cluster-wide gossip network.
In doing so, it satisfies the `NodeDescStore` interface and can be used
as an `AddressResolver` with a small adapter.

Second, Proxy is capable of providing Range addressing information in
the form of RangeDescriptors through delegated RangeLookup requests.
This is necessary because SQL-only tenants are restricted from reading
Range Metadata keys directly. Instead, the RangeLookup requests are
proxied through existing KV nodes while being subject to additional
validation (e.g. is the Range being requested owned by the requesting
tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface
and can be used to delegate all DistSender/RangeCache descriptor lookups
to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without
joining the KV cluster's gossip network. This works if I comment out a
few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably,
with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start`
removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and
things "just work".
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 13, 2020
Fixes cockroachdb#47909.

This commit starts by adding two RPCs to the Internal service:
```
service Internal {
	...
	rpc RangeFeed          (RangeFeedRequest)          returns (stream RangeFeedEvent)          {}
	rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
    ...
}
```

The commit then introduces new `kvtenant.Proxy` object. Proxy mediates
the communication of cluster-wide state to sandboxed SQL-only tenant
processes through a restricted interface. A Proxy is seeded with a set
of one or more network addresses that reference existing KV nodes in the
cluster (or a load-balancer which fans out to some/all KV nodes). On
startup, it establishes contact with one of these nodes to learn about
the topology of the cluster and bootstrap the rest of SQL <-> KV network
communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes
in the cluster in the form of NodeDescriptors. This obviates the need
for SQL-only tenant processes to join the cluster-wide gossip network.
In doing so, it satisfies the `NodeDescStore` interface and can be used
as an `AddressResolver` with a small adapter.

Second, Proxy is capable of providing Range addressing information in
the form of RangeDescriptors through delegated RangeLookup requests.
This is necessary because SQL-only tenants are restricted from reading
Range Metadata keys directly. Instead, the RangeLookup requests are
proxied through existing KV nodes while being subject to additional
validation (e.g. is the Range being requested owned by the requesting
tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface
and can be used to delegate all DistSender/RangeCache descriptor lookups
to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without
joining the KV cluster's gossip network. This works if I comment out a
few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably,
with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start`
removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and
things "just work".
craig bot pushed a commit that referenced this issue Jul 15, 2020
50520: kv/kvclient: introduce new tenant Proxy r=nvanbenschoten a=nvanbenschoten

Closes #47909.

This commit starts by adding two RPCs to the Internal service:
```proto
service Internal {
    ...
    rpc RangeFeed          (RangeFeedRequest)          returns (stream RangeFeedEvent)          {}
    rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
    ...
}
```

The commit then introduces new `kvtenant.Proxy` object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the `NodeDescStore` interface and can be used as an `AddressResolver` with a small adapter.

Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to #49692 and #47150 in SQL. Notably, with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start` removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and things "just work".

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in b7430b5 Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants