-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: support managing shared process virtual clusters #111533
roachprod: support managing shared process virtual clusters #111533
Conversation
In practice, I'll wait for #111064 before merging this (and will update it accordingly since that PR adds |
f53b210
to
ab44448
Compare
ab44448
to
b27b344
Compare
start-sql
Apologies for the relatively large PR; it's hard to break it down. Creating and starting virtual clusters via SQL basically requires us to stop referencing them by ID as before, as we no longer have control of the |
b27b344
to
2775926
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)
pkg/cmd/roachprod/main.go
line 575 at r2 (raw file):
// clusters. We do not expose the functionality of choosing a // specific port for separate-process deployments; for // shared-process, the port always be based on the system tenant
Nit: the port will always...
pkg/roachprod/install/cluster_synced.go
line 463 at r2 (raw file):
} else { res, err := c.ExecSQL(ctx, l, c.Nodes[:1], "", 0, []string{ "-e", fmt.Sprintf("ALTER TENANT '%s' STOP SERVICE", virtualClusterName),
"ALTER VIRTUAL CLUSTER" may work here.
pkg/roachprod/install/cockroach.go
line 234 at r2 (raw file):
for _, serviceType := range []ServiceType{ServiceTypeSQL, ServiceTypeUI} { if _, ok := serviceMap[sharedProcessVirtualClusterNode][serviceType]; !ok { servicesToRegister = append(servicesToRegister, ServiceDesc{
I think we still need to provide the port here for the storage / system layer, as this will get registered against DNS with port 0, unless I'm missing something?
pkg/roachprod/install/cockroach.go
line 969 at r2 (raw file):
var tenantPrefix string if virtualCluster != "" && virtualCluster != SystemInterfaceName { tenantPrefix = fmt.Sprintf("ALTER TENANT '%s' ", virtualCluster)
I think "ALTER VIRTUAL CLUSTER" also works.
pkg/roachprod/install/cockroach.go
line 981 at r2 (raw file):
var clusterSettingsString string for name, value := range clusterSettings { clusterSettingsString += fmt.Sprintf("%sSET CLUSTER SETTING %s = '%s';\n", tenantPrefix, name, value)
Nice catch on the cluster settings!
pkg/roachprod/install/cockroach.go
line 1156 at r2 (raw file):
// If the virtual cluster metadata does not exist yet, create it. virtualClusterStmts = append(virtualClusterStmts, fmt.Sprintf("CREATE TENANT '%s'", startOpts.VirtualClusterName),
Alias "CREATE VIRTUAL CLUSTER" may also work.
pkg/roachprod/install/cockroach.go
line 1161 at r2 (raw file):
virtualClusterStmts = append(virtualClusterStmts, fmt.Sprintf( "ALTER TENANT '%s' START SERVICE %s",
Same as above, I think "ALTER VIRTUAL CLUSTER" is supported.
pkg/roachprod/install/services.go
line 181 at r2 (raw file):
services, err := c.DiscoverServices( ctx, virtualClusterName, serviceType, ServiceNodePredicate(node), ServiceModePredicate(ServiceModeExternal), ServiceInstancePredicate(sqlInstance),
Not sure if this matters here and just an observation; but since a virtual cluster (tenant) must exclusively run in shared or external mode, and is not allowed both at the same time I'm not sure if the predicate has an effect, but I could be missing some logic here. It might be a result of shared processes not having the port set requiring the more intensive branch logic below?
pkg/roachprod/install/services.go
line 217 at r2 (raw file):
// interface service. if len(services) > 0 { services[0].Port = systemServices[0].Port
Any reason for not setting the port when register a shared process service? I noticed it above, and was unsure of the reasoning. In a sense the service descriptors were designed to allow semi-transparent listing and the mode was included in the service registry so that the logic for including the cluster option (virtual cluster name) in pgurl is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left a few questions for my own understanding, and some minor improvements possibly.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)
Previously, this function would execute `cockroach sql`, log the output and return whether any errors were found. In this commit, we change the function to return the results to the caller instead. This is more flexible since 1) not every caller might want the results logged; and 2) allows callers to behave differently depending on the output of the statement executed. Epic: none Release note: None
This commit changes when the tenant metadata is created: instead of creating the system table entry out of band in `multitenant.go`, we now make this part of the `Start` function itself; this allows users of roachprod as a library to have this behaviour out of the box. Epic: none Release note: None
This unifies the handling of whether to pass a `cluster=` connection option in `NodeURL`. Previously, this was done out of band before every call to `NodeURL`, which was error prone: as an example, `roachprod sql` had code that correctly handled this (i.e., not setting a cluster option if service mode is shared); `roachprod pgurl`, on the other hand, would always set `cluster=`, generating an invalid connection string. Epic: none Release note: None
This commits changes how `start-sql` operates by introducing a new `--external-cluster` command line argument: when passed, it indicates that the virtual cluster should be serviced by separate SQL server processes. By default `start-sql` will create shared process tenants. The previous behaviour of `start-sql` was to always create separate processes. The "virtual-cluster-nodes" argument that used to be required is now only passed via `--external-cluster`. This commit also changes the `(*SyncedCluster).Start()` code to set default cluster settings for tenants as well, including organization name and license, if available. Epic: none Release note: None
Similar to the previous commit, we now add support to shared process virtual clusters to `stop-sql`. Since there is no process to be stopped in this case, the `stop` call performs a `ALTER TENANT %s STOP SERVICE` call in order to stop the virtual cluster. This should be transparent to the user. Epic: none Release note: None
This changes how virtual clusters are referenced in roachprod commands (`start-sql` and `stop-sql`). Specifically, where before a `cluster-id` was requested, we now request a virtual cluster name. This makes the API more user-friendly and removes the need for the caller to keep track of virtual cluster IDs. With this change we now remove the "default virtual cluster name" logic, which is no longer necessary. Epic: none Release note: None
2775926
to
42d27f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @herkolategan)
pkg/cmd/roachprod/main.go
line 575 at r2 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: the port will always...
Done.
pkg/roachprod/install/cluster_synced.go
line 463 at r2 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
"ALTER VIRTUAL CLUSTER" may work here.
It does work; the reason l chose to use ALTER TENANT
is because I want this to work in upgrade tests (23.1+), where that syntax won't be available.
pkg/roachprod/install/cockroach.go
line 234 at r2 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
I think we still need to provide the port here for the storage / system layer, as this will get registered against DNS with port 0, unless I'm missing something?
The port is left at 0 intentionally. There's no safe way to set a port for a shared-process virtual cluster since it depends on the node. For now, the approach is as follows: when a shared process virtual cluster is created, we create a single service descriptor, with fixed values for node (sharedProcessVirtualClusterNode
), and Port (0
). During discovery time, if we find that the service is shared, we inherit these properties from the corresponding system service.
Consider the following:
- User starts system interface on nodes 1,2
- User creates shared-process virtual cluter
- User starts system interface on nodes 3,4
Now, if I want to connect to the virtual cluster through node 3, how do I route it? If we just use the information available during creation time, we won't be able to find the proper URL. To work around this type of scenario, handling of this logic exists during DiscoverServices
.
pkg/roachprod/install/services.go
line 181 at r2 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Not sure if this matters here and just an observation; but since a virtual cluster (tenant) must exclusively run in shared or external mode, and is not allowed both at the same time I'm not sure if the predicate has an effect, but I could be missing some logic here. It might be a result of shared processes not having the port set requiring the more intensive branch logic below?
See comment I made on the "port 0" thread. I also made this code a little simpler after some reflection and added some comments. Let me know if it's clearer of if there's still something that you feel should be better handled or documented.
In the near future, I want to handle the TODO you left below and return an error if there are no services to be discovered. We can't do that at the moment because there are still tests (c2c
) that create virtual clusters via SQL directly, and therefore there won't be any service descriptors for them.
pkg/roachprod/install/services.go
line 217 at r2 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Any reason for not setting the port when register a shared process service? I noticed it above, and was unsure of the reasoning. In a sense the service descriptors were designed to allow semi-transparent listing and the mode was included in the service registry so that the logic for including the cluster option (virtual cluster name) in pgurl is possible.
See comments above. Let me know if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the responses!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)
pkg/roachprod/install/cluster_synced.go
line 463 at r2 (raw file):
Previously, renatolabs (Renato Costa) wrote…
It does work; the reason l chose to use
ALTER TENANT
is because I want this to work in upgrade tests (23.1+), where that syntax won't be available.
Ah good catch, that makes sense.
pkg/roachprod/install/cockroach.go
line 234 at r2 (raw file):
Previously, renatolabs (Renato Costa) wrote…
The port is left at 0 intentionally. There's no safe way to set a port for a shared-process virtual cluster since it depends on the node. For now, the approach is as follows: when a shared process virtual cluster is created, we create a single service descriptor, with fixed values for node (
sharedProcessVirtualClusterNode
), and Port (0
). During discovery time, if we find that the service is shared, we inherit these properties from the corresponding system service.Consider the following:
- User starts system interface on nodes 1,2
- User creates shared-process virtual cluter
- User starts system interface on nodes 3,4
Now, if I want to connect to the virtual cluster through node 3, how do I route it? If we just use the information available during creation time, we won't be able to find the proper URL. To work around this type of scenario, handling of this logic exists during
DiscoverServices
.
Right, I agree the underlying storage cluster port vs. shared service port is a bit of a conundrum. In a sense we can drop using the weight field as the qualifier for indicating a shared process and infer it from port 0. But we can do that in another PR, as it will likely also cause versioning issues with roachprod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=herkolategan
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @herkolategan)
pkg/roachprod/install/cockroach.go
line 234 at r2 (raw file):
In a sense we can drop using the weight field as the qualifier for indicating a shared process and infer it from port 0
Hrmm true. I'll consider making this change in follow-up PRs, but it's also not too terrible to keep as-is IMO.
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 0452c84 to blathers/backport-release-23.1-111533: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This PR adds support for shared process virtual clusters using
start-sql
andstop-sql
; previously, these subcommands would always create separate process virtual clusters. After this PR,start-sql
will create shared process tenants by default. Alternatively, the--external-cluster
command line flag can be used to indicate where the separate processes should be deployed. Similarly,stop-sql
may be used to stop separate process virtual clusters, as before, by killing the corresponding OS process; or shared process virtual clusters by using SQL (STOP SERVICE
).A few other changes introduced in this PR, done to support this work:
ExecSQL
more flexible;pgurl
, where we would include the virtual cluster name in the connection url returned byroachprod pgurl
. This is not allowed since the URL pointed to the SQL server process, which does not handle that option.CREATE TENANT
instead ofcrdb_internal.create_tenant
. The tenant ID is computed automatically and used when starting the cockroach process for separate process deployments.Epic: none
Release note: None