Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: support managing shared process virtual clusters #111533

Merged
merged 6 commits into from
Oct 19, 2023

Conversation

renatolabs
Copy link
Contributor

@renatolabs renatolabs commented Sep 29, 2023

This PR adds support for shared process virtual clusters using start-sql and stop-sql; previously, these subcommands would always create separate process virtual clusters. After this PR, start-sql will create shared process tenants by default. Alternatively, the --external-cluster command line flag can be used to indicate where the separate processes should be deployed. Similarly, stop-sql may be used to stop separate process virtual clusters, as before, by killing the corresponding OS process; or shared process virtual clusters by using SQL (STOP SERVICE).

A few other changes introduced in this PR, done to support this work:

  • Made ExecSQL more flexible;
  • Fixed a bug in pgurl, where we would include the virtual cluster name in the connection url returned by roachprod pgurl. This is not allowed since the URL pointed to the SQL server process, which does not handle that option.
  • Remove ability to create tenants by specifying a tenant ID. We are now creating tenants with CREATE TENANT instead of crdb_internal.create_tenant. The tenant ID is computed automatically and used when starting the cockroach process for separate process deployments.

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@renatolabs
Copy link
Contributor Author

In practice, I'll wait for #111064 before merging this (and will update it accordingly since that PR adds stop-sql). But getting this PR out here so that I can perform some further testing.

@renatolabs renatolabs force-pushed the rc/shared-process-tenants branch from f53b210 to ab44448 Compare October 2, 2023 20:10
@renatolabs renatolabs force-pushed the rc/shared-process-tenants branch from ab44448 to b27b344 Compare October 12, 2023 20:21
@renatolabs renatolabs changed the title roachprod: support starting shared process tenants using start-sql roachprod: support managing shared process virtual clusters Oct 12, 2023
@renatolabs renatolabs marked this pull request as ready for review October 12, 2023 20:38
@renatolabs renatolabs requested a review from a team as a code owner October 12, 2023 20:38
@renatolabs renatolabs requested review from herkolategan and DarrylWong and removed request for a team October 12, 2023 20:38
@renatolabs
Copy link
Contributor Author

Apologies for the relatively large PR; it's hard to break it down. Creating and starting virtual clusters via SQL basically requires us to stop referencing them by ID as before, as we no longer have control of the id column in system.tenants.

@renatolabs renatolabs force-pushed the rc/shared-process-tenants branch from b27b344 to 2775926 Compare October 13, 2023 18:20
Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)


pkg/cmd/roachprod/main.go line 575 at r2 (raw file):

		// clusters. We do not expose the functionality of choosing a
		// specific port for separate-process deployments; for
		// shared-process, the port always be based on the system tenant

Nit: the port will always...


pkg/roachprod/install/cluster_synced.go line 463 at r2 (raw file):

	} else {
		res, err := c.ExecSQL(ctx, l, c.Nodes[:1], "", 0, []string{
			"-e", fmt.Sprintf("ALTER TENANT '%s' STOP SERVICE", virtualClusterName),

"ALTER VIRTUAL CLUSTER" may work here.


pkg/roachprod/install/cockroach.go line 234 at r2 (raw file):

		for _, serviceType := range []ServiceType{ServiceTypeSQL, ServiceTypeUI} {
			if _, ok := serviceMap[sharedProcessVirtualClusterNode][serviceType]; !ok {
				servicesToRegister = append(servicesToRegister, ServiceDesc{

I think we still need to provide the port here for the storage / system layer, as this will get registered against DNS with port 0, unless I'm missing something?


pkg/roachprod/install/cockroach.go line 969 at r2 (raw file):

	var tenantPrefix string
	if virtualCluster != "" && virtualCluster != SystemInterfaceName {
		tenantPrefix = fmt.Sprintf("ALTER TENANT '%s' ", virtualCluster)

I think "ALTER VIRTUAL CLUSTER" also works.


pkg/roachprod/install/cockroach.go line 981 at r2 (raw file):

	var clusterSettingsString string
	for name, value := range clusterSettings {
		clusterSettingsString += fmt.Sprintf("%sSET CLUSTER SETTING %s = '%s';\n", tenantPrefix, name, value)

Nice catch on the cluster settings!


pkg/roachprod/install/cockroach.go line 1156 at r2 (raw file):

		// If the virtual cluster metadata does not exist yet, create it.
		virtualClusterStmts = append(virtualClusterStmts,
			fmt.Sprintf("CREATE TENANT '%s'", startOpts.VirtualClusterName),

Alias "CREATE VIRTUAL CLUSTER" may also work.


pkg/roachprod/install/cockroach.go line 1161 at r2 (raw file):

	virtualClusterStmts = append(virtualClusterStmts, fmt.Sprintf(
		"ALTER TENANT '%s' START SERVICE %s",

Same as above, I think "ALTER VIRTUAL CLUSTER" is supported.


pkg/roachprod/install/services.go line 181 at r2 (raw file):

	services, err := c.DiscoverServices(
		ctx, virtualClusterName, serviceType,
		ServiceNodePredicate(node), ServiceModePredicate(ServiceModeExternal), ServiceInstancePredicate(sqlInstance),

Not sure if this matters here and just an observation; but since a virtual cluster (tenant) must exclusively run in shared or external mode, and is not allowed both at the same time I'm not sure if the predicate has an effect, but I could be missing some logic here. It might be a result of shared processes not having the port set requiring the more intensive branch logic below?


pkg/roachprod/install/services.go line 217 at r2 (raw file):

			// interface service.
			if len(services) > 0 {
				services[0].Port = systemServices[0].Port

Any reason for not setting the port when register a shared process service? I noticed it above, and was unsure of the reasoning. In a sense the service descriptors were designed to allow semi-transparent listing and the mode was included in the service registry so that the logic for including the cluster option (virtual cluster name) in pgurl is possible.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a few questions for my own understanding, and some minor improvements possibly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)

Previously, this function would execute `cockroach sql`, log the
output and return whether any errors were found. In this commit, we
change the function to return the results to the caller instead. This
is more flexible since 1) not every caller might want the results
logged; and 2) allows callers to behave differently depending on the
output of the statement executed.

Epic: none

Release note: None
This commit changes when the tenant metadata is created: instead of
creating the system table entry out of band in `multitenant.go`, we
now make this part of the `Start` function itself; this allows users
of roachprod as a library to have this behaviour out of the box.

Epic: none

Release note: None
This unifies the handling of whether to pass a `cluster=` connection
option in `NodeURL`. Previously, this was done out of band before
every call to `NodeURL`, which was error prone: as an example,
`roachprod sql` had code that correctly handled this (i.e., not
setting a cluster option if service mode is shared); `roachprod
pgurl`, on the other hand, would always set `cluster=`, generating an
invalid connection string.

Epic: none

Release note: None
This commits changes how `start-sql` operates by introducing a new
`--external-cluster` command line argument: when passed, it indicates
that the virtual cluster should be serviced by separate SQL server
processes. By default `start-sql` will create shared process tenants.

The previous behaviour of `start-sql` was to always create separate
processes. The "virtual-cluster-nodes" argument that used to be
required is now only passed via `--external-cluster`.

This commit also changes the `(*SyncedCluster).Start()` code to set
default cluster settings for tenants as well, including organization
name and license, if available.

Epic: none

Release note: None
Similar to the previous commit, we now add support to shared process
virtual clusters to `stop-sql`. Since there is no process to be
stopped in this case, the `stop` call performs a `ALTER TENANT %s STOP
SERVICE` call in order to stop the virtual cluster. This should be
transparent to the user.

Epic: none

Release note: None
This changes how virtual clusters are referenced in roachprod commands
(`start-sql` and `stop-sql`). Specifically, where before a
`cluster-id` was requested, we now request a virtual cluster
name. This makes the API more user-friendly and removes the need for
the caller to keep track of virtual cluster IDs.

With this change we now remove the "default virtual cluster name"
logic, which is no longer necessary.

Epic: none

Release note: None
@renatolabs renatolabs force-pushed the rc/shared-process-tenants branch from 2775926 to 42d27f7 Compare October 18, 2023 21:27
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @herkolategan)


pkg/cmd/roachprod/main.go line 575 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Nit: the port will always...

Done.


pkg/roachprod/install/cluster_synced.go line 463 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

"ALTER VIRTUAL CLUSTER" may work here.

It does work; the reason l chose to use ALTER TENANT is because I want this to work in upgrade tests (23.1+), where that syntax won't be available.


pkg/roachprod/install/cockroach.go line 234 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

I think we still need to provide the port here for the storage / system layer, as this will get registered against DNS with port 0, unless I'm missing something?

The port is left at 0 intentionally. There's no safe way to set a port for a shared-process virtual cluster since it depends on the node. For now, the approach is as follows: when a shared process virtual cluster is created, we create a single service descriptor, with fixed values for node (sharedProcessVirtualClusterNode), and Port (0). During discovery time, if we find that the service is shared, we inherit these properties from the corresponding system service.

Consider the following:

  • User starts system interface on nodes 1,2
  • User creates shared-process virtual cluter
  • User starts system interface on nodes 3,4

Now, if I want to connect to the virtual cluster through node 3, how do I route it? If we just use the information available during creation time, we won't be able to find the proper URL. To work around this type of scenario, handling of this logic exists during DiscoverServices.


pkg/roachprod/install/services.go line 181 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Not sure if this matters here and just an observation; but since a virtual cluster (tenant) must exclusively run in shared or external mode, and is not allowed both at the same time I'm not sure if the predicate has an effect, but I could be missing some logic here. It might be a result of shared processes not having the port set requiring the more intensive branch logic below?

See comment I made on the "port 0" thread. I also made this code a little simpler after some reflection and added some comments. Let me know if it's clearer of if there's still something that you feel should be better handled or documented.

In the near future, I want to handle the TODO you left below and return an error if there are no services to be discovered. We can't do that at the moment because there are still tests (c2c) that create virtual clusters via SQL directly, and therefore there won't be any service descriptors for them.


pkg/roachprod/install/services.go line 217 at r2 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Any reason for not setting the port when register a shared process service? I noticed it above, and was unsure of the reasoning. In a sense the service descriptors were designed to allow semi-transparent listing and the mode was included in the service registry so that the logic for including the cluster option (virtual cluster name) in pgurl is possible.

See comments above. Let me know if that makes sense.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the responses!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @renatolabs)


pkg/roachprod/install/cluster_synced.go line 463 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

It does work; the reason l chose to use ALTER TENANT is because I want this to work in upgrade tests (23.1+), where that syntax won't be available.

Ah good catch, that makes sense.


pkg/roachprod/install/cockroach.go line 234 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

The port is left at 0 intentionally. There's no safe way to set a port for a shared-process virtual cluster since it depends on the node. For now, the approach is as follows: when a shared process virtual cluster is created, we create a single service descriptor, with fixed values for node (sharedProcessVirtualClusterNode), and Port (0). During discovery time, if we find that the service is shared, we inherit these properties from the corresponding system service.

Consider the following:

  • User starts system interface on nodes 1,2
  • User creates shared-process virtual cluter
  • User starts system interface on nodes 3,4

Now, if I want to connect to the virtual cluster through node 3, how do I route it? If we just use the information available during creation time, we won't be able to find the proper URL. To work around this type of scenario, handling of this logic exists during DiscoverServices.

Right, I agree the underlying storage cluster port vs. shared service port is a bit of a conundrum. In a sense we can drop using the weight field as the qualifier for indicating a shared process and infer it from port 0. But we can do that in another PR, as it will likely also cause versioning issues with roachprod.

@renatolabs renatolabs added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-23.1.x Flags PRs that need to be backported to 23.1 labels Oct 19, 2023
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=herkolategan

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong and @herkolategan)


pkg/roachprod/install/cockroach.go line 234 at r2 (raw file):

In a sense we can drop using the weight field as the qualifier for indicating a shared process and infer it from port 0

Hrmm true. I'll consider making this change in follow-up PRs, but it's also not too terrible to keep as-is IMO.

@craig
Copy link
Contributor

craig bot commented Oct 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 19, 2023

Build succeeded:

@craig craig bot merged commit 11cb0ae into cockroachdb:master Oct 19, 2023
3 checks passed
@blathers-crl
Copy link

blathers-crl bot commented Oct 19, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0452c84 to blathers/backport-release-23.1-111533: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants