-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: add virtual cluster support to the monitor #111064
roachprod: add virtual cluster support to the monitor #111064
Conversation
0349e34
to
9cbeda5
Compare
Open for review. A few runs so far:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Minor nits about tenant vs. virtual cluster naming and a few other things.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs and @smg260)
-- commits
line 38 at r1:
Nit: Should this now be virtual cluster support?
-- commits
line 43 at r1:
Nit: Should this now be non virtual cluster deployments?
pkg/cmd/roachprod/main.go
line 561 at r1 (raw file):
} // Always pick a random available port when starting tenants. We
Nit: virtual clusters.
pkg/roachprod/multitenant.go
line 84 at r1 (raw file):
// StopServiceForVirtualCluster stops SQL instance processes on the virtualCluster given. func StopServiceForVirtualCluster( ctx context.Context, l *logger.Logger, tenantCluster string, stopOpts StopOpts,
Nit: Do we want to update this to virtual cluster as well?
pkg/roachprod/multitenant.go
line 99 at r1 (raw file):
// tenant with ID given. // // TODO(herko): Allow users to pass in a tenant name.
Nit: Few mentions of tenant here again. As stated in the TODO we'll probably want to be able to pass virtual cluster name in the future directly once processes can start with a virtual cluster name instead of a virtual cluster id.
pkg/roachprod/roachprod.go
line 716 at r1 (raw file):
MaxWait int // Options that only apply to StopTenant
Nit: StopVirtualCluster?
pkg/roachprod/install/cluster_synced.go
line 403 at r1 (raw file):
// It sends a signal to all processes that have been started with ROACHPROD env // var and optionally waits until the processes stop. If the virtualClusterLabel // is not empty, then only tenant processes with a matching label are stopped.
Nit: sql processes.
pkg/roachprod/install/cluster_synced.go
line 444 at r1 (raw file):
// for output/logging. If wait is true, the command will wait for the processes // to exit, up to maxWait seconds. // TODO(herko): This command does not support virtual clusters yet.
I suppose we can remove this comment now?
pkg/roachprod/install/cluster_synced.go
line 647 at r1 (raw file):
virtualClusterDesc := func(name string, instance int) string { if name == SystemInterfaceName { return "system tenant"
Nit: Wondering if we should call this system interface?
pkg/roachprod/install/cluster_synced.go
line 746 at r1 (raw file):
) if err := errors.CombineErrors(err, result.Err); err != nil { err := errors.Wrap(err, "failed to list tenants")
Nit: virtual clusters.
pkg/roachprod/install/cluster_synced.go
line 869 at r1 (raw file):
{{ end }} # make sure all tenant monitors quit when this script exits. In
Nit: sql process / virtual cluster monitors?
pkg/roachprod/install/cluster_synced.go
line 939 at r1 (raw file):
panic(fmt.Errorf("invalid output from monitor: %q", line)) } // Tenant name and instance are the first fields of every
Nit: virtual cluster name.
pkg/roachprod/install/cluster_synced.go
line 942 at r1 (raw file):
// event type. name, instanceStr := parts[0], parts[1] instance, _ := strconv.Atoi(instanceStr)
Nit: Should we maybe panic like above in the rare case that this string is not an integer?
pkg/roachprod/install/cluster_synced.go
line 949 at r1 (raw file):
}}) case runningMsg: pid := parts[3]
Nit: Should we add a len < 4
panic here as above?
pkg/roachprod/install/cluster_synced.go
line 954 at r1 (raw file):
}}) case deadMsg: exitCode := parts[3]
Nit: Should we add a len < 4
panic here as above?
pkg/roachprod/install/cockroach.go
line 602 at r1 (raw file):
} // TenantInfoFromLabel takes as parameter a tenant label produced with
Nit: VirtualClusterInfoFromLabel ... virtual cluster label.
pkg/roachprod/install/cockroach_test.go
line 27 at r1 (raw file):
}{ { name: "empty tenant name",
Nit: empty virtual cluster name.
pkg/roachprod/install/cockroach_test.go
line 32 at r1 (raw file):
}, { name: "system tenant name",
Nit: System interface name?
pkg/roachprod/install/cockroach_test.go
line 37 at r1 (raw file):
}, { name: "simple app tenant name",
Nit: simple app virtual cluster name.
pkg/roachprod/install/cockroach_test.go
line 43 at r1 (raw file):
}, { name: "tenant name with hyphens",
Nit: virtual cluster name with hyphens.
pkg/roachprod/install/cockroach_test.go
line 58 at r1 (raw file):
require.NoError(t, err) expectedTenantName := tc.virtualClusterName
Nit: expectedVirtualClusterName.
de188ba
to
c8d7695
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan and @smg260)
Previously, herkolategan (Herko Lategan) wrote…
Nit: Should this now be virtual cluster support?
Done.
Previously, herkolategan (Herko Lategan) wrote…
Nit: Should this now be non virtual cluster deployments?
Done.
pkg/cmd/roachprod/main.go
line 561 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: virtual clusters.
Done.
pkg/roachprod/multitenant.go
line 84 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Do we want to update this to virtual cluster as well?
Done.
pkg/roachprod/multitenant.go
line 99 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Few mentions of tenant here again. As stated in the TODO we'll probably want to be able to pass virtual cluster name in the future directly once processes can start with a virtual cluster name instead of a virtual cluster id.
Done.
pkg/roachprod/roachprod.go
line 716 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: StopVirtualCluster?
Done.
pkg/roachprod/install/cluster_synced.go
line 403 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: sql processes.
Done.
pkg/roachprod/install/cluster_synced.go
line 444 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
I suppose we can remove this comment now?
Done.
pkg/roachprod/install/cluster_synced.go
line 647 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Wondering if we should call this system interface?
Done.
pkg/roachprod/install/cluster_synced.go
line 746 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: virtual clusters.
Done.
pkg/roachprod/install/cluster_synced.go
line 869 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: sql process / virtual cluster monitors?
Done.
pkg/roachprod/install/cluster_synced.go
line 939 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: virtual cluster name.
Done.
pkg/roachprod/install/cluster_synced.go
line 942 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Should we maybe panic like above in the rare case that this string is not an integer?
Done.
pkg/roachprod/install/cluster_synced.go
line 949 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Should we add a
len < 4
panic here as above?
Done.
pkg/roachprod/install/cluster_synced.go
line 954 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: Should we add a
len < 4
panic here as above?
Done.
pkg/roachprod/install/cockroach.go
line 602 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: VirtualClusterInfoFromLabel ... virtual cluster label.
Done.
pkg/roachprod/install/cockroach_test.go
line 27 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: empty virtual cluster name.
Done.
pkg/roachprod/install/cockroach_test.go
line 32 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: System interface name?
Done.
pkg/roachprod/install/cockroach_test.go
line 37 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: simple app virtual cluster name.
Done.
pkg/roachprod/install/cockroach_test.go
line 43 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: virtual cluster name with hyphens.
Done.
pkg/roachprod/install/cockroach_test.go
line 58 at r1 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: expectedVirtualClusterName.
Done.
Rebased and addressed comments. Running another subset of tests just to be sure. If everything is well, plan to merge this tomorrow: https://teamcity.cockroachdb.com/viewQueued.html?itemId=12021678&tab=queuedBuildOverviewTab |
Build finished without any unexpected failures. Merging. TFTR! bors r=herkolategan |
Build failed: |
bors r- |
Build failed (retrying...): |
Build failed: |
The help string for this subcommand already mentioned support for the `tag` parameter; however, that was not implemented. In this commit, we update the `startInstanceAsSeparateProcessCmd` struct to accep the `--tag` flag and also apply a small fix to the help string (app and storage nodes no longer need to be distinct). Epic: none Release note: None
This adds a `roachprod stop-sql` command. It is similar to `roachprod stop` in the sense that it takes a similar set of flags: the signal to be sent to the processes, whether to wait for the process to finish, and for how long. However, one crucial difference is that `roachprod stop` stops *all* processes started by roachprod (cockroach or otherwise), whereas `stop-sql` was designed specifically for stopping tenant processes (SQL instances). To achieve that, we set a `ROACHPROD_VIRTUAL_CLUSTER` environment variable to the corresponding cockroach process when starting it. This label is then used to find the correct process to stop when requested. We also make use of the label to name the systemctl unit when starting cockroach processes on remote clusters. This allows multiple tenants to be co-located on the same VM. Epic: none Release note: None
This makes the roachprod monitor aware of virtual clusters. This means that the monitor is able to identify scenarios where multiple instances are running on the same VM, and monitor the lifecycle of sql instance processes just like it does for non virtual cluster deployments. Under the covers, this is achieved by leveraging the "ROACHPROD_VIRTUAL_CLUSTER" environment variable that is set when starting cockroach. The monitor then looks for processes that have that environment variable set, and parses it to identify the virtual cluster name and instance corresponding to that cockroach process. This change should go unnoticed by all current monitor callers: tests that monitor cockroach processes will continue to be notified of events just like they did before. The only thing that changes is that, if a test uses roachprod to start and stop sql instances, these events will be available through the monitor. Epic: none Release note: None
a348e68
to
fc4ff31
Compare
bors retry |
Build succeeded: |
After cockroachdb#111064, we started naming the systemd unit for cockroach processes differently, since we now might have multiple cockroach processes in the same VM in the context of SQL server processes and cluster virtualization. That change broke a few tests that directly referenced the `cockroach.service` unit. In this change, we create a util function that is reponsible for encapsulating that name and replace direct references to the systemd unit name with calls to this new function. Fixes: cockroachdb#111902 Release note: None
111915: roachtest: remove direct references to cockroach.service from tests r=herkolategan a=renatolabs After #111064, we started naming the systemd unit for cockroach processes differently, since we now might have multiple cockroach processes in the same VM in the context of SQL server processes and cluster virtualization. That change broke a few tests that directly referenced the `cockroach.service` unit. In this change, we create a util function that is reponsible for encapsulating that name and replace direct references to the systemd unit name with calls to this new function. Fixes: #111902 Release note: None 111937: kv: add conflicting txn_meta to error message r=nvanbenschoten a=aadityasondhi This patch adds back the conflicting txn_meta to the error message printed out from a `TransactionRetryError` if available. Fixes #110689. Release note: None Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Aaditya Sondhi <[email protected]>
This call was added during work on cockroachdb#111064 as an attempt to fix issues during development. It should not be needed for correctness. It also makes backporting that work to 23.1 not possible, since the `WaitDelay` API was introduced in Go 1.20. Epic: none Release note: None
112714: roachprod: remove `WaitDelay` call in `localSession` r=herkolategan a=renatolabs This call was added during work on #111064 as an attempt to fix issues during development. It should not be needed for correctness. It also makes backporting that work to 23.1 not possible, since the `WaitDelay` API was introduced in Go 1.20. Epic: none Release note: None Co-authored-by: Renato Costa <[email protected]>
This call was added during work on #111064 as an attempt to fix issues during development. It should not be needed for correctness. It also makes backporting that work to 23.1 not possible, since the `WaitDelay` API was introduced in Go 1.20. Epic: none Release note: None
After cockroachdb#111064, we started naming the systemd unit for cockroach processes differently, since we now might have multiple cockroach processes in the same VM in the context of SQL server processes and cluster virtualization. That change broke a few tests that directly referenced the `cockroach.service` unit. In this change, we create a util function that is reponsible for encapsulating that name and replace direct references to the systemd unit name with calls to this new function. Fixes: cockroachdb#111902 Release note: None
This call was added during work on cockroachdb#111064 as an attempt to fix issues during development. It should not be needed for correctness. It also makes backporting that work to 23.1 not possible, since the `WaitDelay` API was introduced in Go 1.20. Epic: none Release note: None
This PR makes the roachprod monitor aware of multi-tenancy. This means
that the monitor is able to identify scenarios where multiple tenants
are running on the same VM, and monitor the lifecycle of tenant
processes just like it does for non multi-tenant deployments.
Under the covers, this is achieved by leveraging the
"ROACHPROD_VIRTUAL_CLUSTER" environment variable that is
set when starting cockroach. The monitor then looks for processes
that have that environment variable set, and parses it to identify the
tenant name and instance corresponding to that cockroach process.
This change should go unnoticed by all current monitor callers: tests
that monitor cockroach processes will continue to be notified of
events just like they did before. The only thing that changes is that,
if a test uses roachprod to start and stop tenants, these events will
be available through the monitor.
This PR also adds a
stop-tenant
command that uses the sameROACHPROD_TENANT
environment variable to allow the caller tostop a specific tenant process.
Epic: none
Release note: None