Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: update multitenant/distsql to use new roachprod service APIs #115599

Merged

Conversation

herkolategan
Copy link
Collaborator

@herkolategan herkolategan commented Dec 5, 2023

Previously the multitenant distsql roachtest relied on an internal util in roachtest to start virtual clusters. This PR updates the test to use the new official roachtest and roachprod APIs for starting virtual clusters.

Some additional changes were required to support upgrading the test. The cluster interface only exposed a method to start storage nodes, but that is insufficient to start virtual clusters that have a separate method on the roachprod API (for starting).

This change adds a new method StartServiceForVirtualCluster to the cluster interface to enable roachtests to start virtual clusters. Some refactoring was required to enable different sets of cluster settings, depending on what service
type is going to be started.

There are now two sets of cluster settings that can be utilised in test_runner. For virtual clusters virtualClusterSettings will be used, and for storage clusters clusterSettings will be utilised.

Fixes: #116019

Release Note: None
Epic: CRDB-31933

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@herkolategan herkolategan changed the title roachtest: add sql instance to functional options roachtest: update multitenant/distsql to use new roachprod service APIs Dec 5, 2023
@herkolategan herkolategan marked this pull request as ready for review December 7, 2023 13:27
@herkolategan herkolategan requested a review from a team as a code owner December 7, 2023 13:27
@herkolategan herkolategan requested review from srosenberg and renatolabs and removed request for a team December 7, 2023 13:27
@herkolategan herkolategan force-pushed the hbl/roachtest-vc-convert-mt-distsql branch from eedde77 to fd14aa1 Compare December 11, 2023 11:37
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about this change! 🎉

Reviewed 1 of 1 files at r1, 4 of 4 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1929 at r2 (raw file):

func (c *clusterImpl) configureClusterSettingOptions(
	defaultEnv map[string]interface{},

What's the rationale for this defaultEnv parameter? From what I can see, we only use it for COCKROACH_CRASH_ON_SPAN_USE_AFTER_FINISH. Is there a reason why we wouldn't want that variable in SQL processes? I'm wondering if we should enforce all cockroach processes to have the same default variables (just like we're doing for the random seed var) and eliminate this argument.


pkg/cmd/roachtest/cluster.go line 2031 at r2 (raw file):

	if settings.Secure {
		if err := c.RefetchCertsFromNode(ctx, 1); err != nil {

Should this really be 1 and not first node in c.MakeNodes(opts...)?


pkg/cmd/roachtest/cluster.go line 2011 at r4 (raw file):

}

func (c *clusterImpl) StartServiceForVirtualClusterE(

This function could use some documentation, especially about externalNodes and in-memory vs external process tenants.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 71 at r3 (raw file):

	c.Start(ctx, t.L(), option.DefaultStartOpts(), settings, c.Node(2))
	c.Start(ctx, t.L(), option.DefaultStartOpts(), settings, c.Node(3))
	storNodes := c.Range(1, 3)

Did you mean storageNodes?


pkg/cmd/roachtest/tests/multitenant_distsql.go line 81 at r3 (raw file):

		instStartOps.RoachprodOpts.Target = install.StartServiceForVirtualCluster
		instStartOps.RoachprodOpts.VirtualClusterName = tenantName
		instStartOps.RoachprodOpts.SQLInstance = sqlInstance

Interesting, we should probably find a way to package these into something higher level too.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 85 at r3 (raw file):

		t.L().Printf("Starting instance %d on node %d", i, node)
		instStartOps.RoachprodOpts.SQLPort = 0
		instStartOps.RoachprodOpts.AdminUIPort = 0

Should we add a TODO to remove this once #111052 is resolved?


pkg/cmd/roachtest/tests/multitenant_distsql.go line 86 at r3 (raw file):

		instStartOps.RoachprodOpts.SQLPort = 0
		instStartOps.RoachprodOpts.AdminUIPort = 0
		err := c.StartServiceForVirtualClusterE(ctx, t.L(), c.Node(node), instStartOps, settings, storNodes)

Why not StartServiceForVirtualCluster?


pkg/cmd/roachtest/tests/multitenant_distsql.go line 101 at r3 (raw file):

	m := c.NewMonitor(ctx, c.Nodes(1, 2, 3))

	//inst1Conn, err := gosql.Open("postgres", c.PGUrl(ctx, 1)

Delete?


pkg/cmd/roachtest/tests/multitenant_distsql.go line 146 at r3 (raw file):

		m.Wait()
		wg.Done()
	}()

I don't understand this block, why do we need it? I also don't see a wg.Wait() call.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan)


pkg/cmd/roachtest/tests/multitenant_distsql.go line 81 at r4 (raw file):

		instStartOps.RoachprodOpts.Target = install.StartServiceForVirtualCluster
		instStartOps.RoachprodOpts.VirtualClusterName = tenantName
		instStartOps.RoachprodOpts.SQLInstance = sqlInstance

Can someone remind me what SQLInstance denotes? I can see that it's expected to be 0 in case of StartSharedProcessForVirtualCluster, but is exact meaning is somewhat obscured by the implementation of service registry/discovery.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan)


pkg/cmd/roachtest/tests/multitenant_distsql.go line 65 at r4 (raw file):

	// performance and resource limitations. We set the minimum range max bytes to
	// 1 byte to bypass the guardrails.
	install.MakeClusterSettings(install.SecureOption(true))

Remove?


pkg/cmd/roachtest/tests/multitenant_distsql.go line 81 at r4 (raw file):
Ok, I found the definition in the PR commit message [1] :).

A SQL Instance can be viewed as a unique identifier for the SQL Instance that is running on a VM in the event that there are more than one SQL Instance serving the same virtual cluster on that VM.

[1] #111582

@herkolategan herkolategan force-pushed the hbl/roachtest-vc-convert-mt-distsql branch 3 times, most recently from eb045d1 to b231bfa Compare December 12, 2023 14:19
Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1929 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

What's the rationale for this defaultEnv parameter? From what I can see, we only use it for COCKROACH_CRASH_ON_SPAN_USE_AFTER_FINISH. Is there a reason why we wouldn't want that variable in SQL processes? I'm wondering if we should enforce all cockroach processes to have the same default variables (just like we're doing for the random seed var) and eliminate this argument.

Good point, removed defaultEnv and made the defaults common to both.


pkg/cmd/roachtest/cluster.go line 2031 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Should this really be 1 and not first node in c.MakeNodes(opts...)?

Copied this code over from StartE, but I think there is an implicit contract (in roachprod) that the absolute first node of any cluster will always carry the certs. Thus after the nodes are started, or any subset of nodes are started, node 1 will contain the latest and will need to be refetched for roachtest to operate from locally.


pkg/cmd/roachtest/cluster.go line 2011 at r4 (raw file):

Previously, renatolabs (Renato Costa) wrote…

This function could use some documentation, especially about externalNodes and in-memory vs external process tenants.

Good point, I'll add some docs.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 71 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Did you mean storageNodes?

Naming is from before, but I like the more explicit storageNodes, will update.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 81 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Interesting, we should probably find a way to package these into something higher level too.

True, I can add a convenience function for service start opts.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 85 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Should we add a TODO to remove this once #111052 is resolved?

Good idea, done.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 86 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Why not StartServiceForVirtualCluster?

Opted to try and keep the test similar to how it was with the same logging intact as previously, but can switch if it's cleaner.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 101 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Delete?

yep, leftover refactoring comment. good catch.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 146 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I don't understand this block, why do we need it? I also don't see a wg.Wait() call.

Good catch, I did forget the wait; not that it helped unfortunately. This was introduced as a workaround because of a bug I intend to log an issue for, but just realised the workaround also does not work, because I forgot the wait group wait. So I'll remove all of this and log the bug for how this gets stuck on local clusters.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 65 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Remove?

Oops, yip my eyes glanced over that one.


pkg/cmd/roachtest/tests/multitenant_distsql.go line 81 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Ok, I found the definition in the PR commit message [1] :).

A SQL Instance can be viewed as a unique identifier for the SQL Instance that is running on a VM in the event that there are more than one SQL Instance serving the same virtual cluster on that VM.

[1] #111582

Yes, not the clearest of terms, but denotes the ith SQL instance of an external process tenant on a specific VM.

@herkolategan herkolategan force-pushed the hbl/roachtest-vc-convert-mt-distsql branch from 5ce7353 to 1b14799 Compare December 12, 2023 17:10
Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated a few things from comments, and for quickref latest distsql only TC run just for sanity checking (passed): https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/13101830

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r5, 4 of 4 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/cmd/roachtest/option/options.go line 61 at r14 (raw file):

// DefaultStartVirtualClusterOpts returns StartOpts for starting an external
// process virtual cluster with the given tenant name and SQL instance.
func DefaultStartVirtualClusterOpts(tenantName string, sqlInstance int) StartOpts {

Should we take the Target as argument? Or at least make it explicit in the function name that this refers to external process deployment?


pkg/cmd/roachtest/option/options.go line 71 at r14 (raw file):

	// TODO(herko): remove this once dynamic port assignment is the default.
	startOpts.RoachprodOpts.SQLPort = 0
	startOpts.RoachprodOpts.AdminUIPort = 0

The issue was closed today, we should be able to delete this now.

Previously the cluster interface only exposed a method to start storage nodes,
but that is insufficient to start virtual clusters that have a separate method
on the `roachprod` API (for starting).

This change adds a new method `StartServiceForVirtualCluster` to the cluster
interface to enable roachtests to start virtual clusters. Some refactoring was
required to enable different sets of cluster settings, depending on what service
type is going to be started.

There are now two sets of cluster settings that can be utilised in
`test_runner`. For virtual clusters `virtualClusterSettings` will be used, and
for storage clusters `clusterSettings` will be utilised.

Epic: None
Release Note: None
Previously the multitenant distsql roachtest relied on an internal util in
`roachtest` to start virtual clusters. This change updates the test to use the
new official `roachtest` and `roachprod` APIs for starting virtual clusters.

Fixes: cockroachdb#116019

Epic: None
Release Note: None
@herkolategan herkolategan force-pushed the hbl/roachtest-vc-convert-mt-distsql branch from 1b14799 to 9f2551b Compare December 14, 2023 09:07
Add a convenience function to return `StartOpts` for starting an external
process virtual cluster.

Epic: None
Release Note: None
@herkolategan herkolategan force-pushed the hbl/roachtest-vc-convert-mt-distsql branch from 9f2551b to 555437e Compare December 14, 2023 09:17
@herkolategan
Copy link
Collaborator Author

TFTRs!

bors r=renatolabs,srosenberg

@craig
Copy link
Contributor

craig bot commented Dec 14, 2023

Build succeeded:

@craig craig bot merged commit cbcdca3 into cockroachdb:master Dec 14, 2023
9 checks passed
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 8, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 8, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 10, 2024
117505: roachtest: assign adminui ports dynamically for virtual clusters r=srosenberg,renatolabs a=DarrylWong

This was originally removed in #115599 due to #114097 merging, but adminui was reverted in #117141 and mistakenly did not revert the special case for virtual clusters. We unskip the multitenant/distsql tests as well.

Release note: None
Fixes: #117150
Fixes: #117149
Epic: None

117545: rpc: rm rangefeed RPC stream window special case r=erikgrinaker,miretskiy a=pav-kv

The rangefeed stream window size tuning was introduced to mitigate OOM in rangefeeds caused by the excessive number of streams (one per `Range`). Since we now use mux rangefeeds (which multiplexes all the rangefeed traffic into a single stream), this setting is no longer needed, so this commit removes it.

Part of #108992

Release note (ops change): `COCKROACH_RANGEFEED_RPC_INITIAL_WINDOW_SIZE` env variable has been removed, and rangefeed connection now uses the same window size as other RPC connections.

117554: sqlproxyccl: improve authentication throttle error r=JeffSwenson a=JeffSwenson

The sql proxy will throttle connection attempts if a (client IP, tenant cluster) pair has too many authentication failures. The error is usually caused by a misconfigured password in a connection pool. This change replaces the "connection attempt throttled" error message with "too many failed authentication attempts". There is a hint that includes this message but not all drivers are configured to log hints.

Fixes #117552

Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Jeff <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jan 10, 2024
This was originally removed in #115599 due to #114097 merging,
but adminui was reverted in #117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit that referenced this pull request Jan 12, 2024
This was originally removed in #115599 due to #114097 merging,
but adminui was reverted in #117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 16, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Feb 6, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: port multitenant/distsql to new APIs
4 participants