server: wait for active configuration profiles during startup #113666

stevendanna · 2023-11-02T13:15:30Z

Before, we did not wait for these configuration profiles at startup. This produces confusing behaviour where the behaviour of the cluster changes substantially a few moments after startup.

For instance the replication-source configuration profile runs

SET CLUSTER SETTING server.controller.default_target_cluster = 'application'

If this is delayed until after we start accepting connections, then for a few moments new connections will go to the system tenant and then afterwards they will go to the application tenant.

Here, we narrow the window of confusion by waiting for the configuration profiles to be complete during the preStart sequence in SQL.

Note that this doesn't solve startup coordination. There are still at least two problems:

Async server startup still means the user may get an error for a few moments after startup until the server is started.
Async settings propagation still means that the default_target_cluster setting can still be delayed.

Informs #111637

Release note: None

cockroach-teamcity · 2023-11-02T13:15:40Z

This change is

yuzefovich

Nice! I only have a few nits

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @stevendanna)

pkg/server/server_sql.go line 1774 at r1 (raw file):

	envs := s.execCfg.AutoConfigProvider.ActiveEnvironments()
	if len(envs) == 0 {
		log.Infof(ctx, "auto-configuration environments not set or already complete")

nit: preStart is called in both system and application tenants. Do we have any auto config changes in the system tenant? If not, should we consider omitting this log message for system tenants then?

pkg/server/autoconfig/auto_config_test.go line 109 at r1 (raw file):

}

func (p *testProvider) ActiveEnvironments() []autoconfigpb.EnvironmentID {

nit: should we adjust the contract on Provider.ActiveEnvironments to explicitly mention that nil (or zero length slice) is returned once there are no more tasks left to run?

pkg/server/autoconfig/auto_config_test.go line 348 at r1 (raw file):

				Provider: provider,
			},
			// Server: &server.TestingKnobs{

nit: leftover.

Before, we did not wait for these configuration profiles at startup. This produces confusing behaviour where the behaviour of the cluster changes substantially a few moments after startup. For instance the replication-source configuration profile runs SET CLUSTER SETT server.controller.default_target_cluster = 'application' If this is delayed until after we start accepting connections, then for a few moments new connections will go to the system tenant and then afterwards they will go to the application tenant. Here, we narrow the window of confusion by waiting for the configuration profiles to be complete during the preStart sequence in SQL. Note that this doesn't solve startup coordination. There are still at least two problems: 1. Async server startup still means the user may get an error for a few moments after startup until the server is started. 2. Async settings propagation still means that the default_target_cluster setting can still be delayed. Informs cockroachdb#111637 Release note: None

stevendanna · 2023-11-06T13:49:38Z

bors r=yuzefovich

craig · 2023-11-06T13:54:30Z

Build failed:

Bazel Essential CI (Cockroach)

stevendanna · 2023-11-06T17:00:57Z

bors retry

craig · 2023-11-06T17:44:43Z

Build succeeded:

Bazel Essential CI (Cockroach)

113687: server: option to wait for default_target_cluster on connect r=yuzefovich a=stevendanna Virtual cluster startup is asynchronous. As a result, it can be hard to orchestrate bringing a virtual cluster up since the client needs to retry until the cluster is available. This makes it a little easier by optionally allowing connections to the default_target_cluster to be held until the cluster is online. As a result, a user who does: ALTER VIRTUAL CLUSTER app START SERVICE SHARED SET CLUSTER SETTING server.controller.default_target_cluster = 'app' Should be able to immediately connect to the same node without having to handle retries while the virtual cluster starts up. Note that we can extend this to wait for _any_ tenant. The restriction on only waiting on the default_target_cluster is that we need to thread a little state into the server controller in order to avoid pitfalls such as holding a lot of connections open for longer than needed for a tenant that will never become available because it doesn't exist. In a future PR, we will set this option by default in the replication-source and replication-target configuration profiles. Built on #113666 Informs #111637 Release note: None Co-authored-by: Steven Danna <[email protected]>

stevendanna requested review from a team as code owners November 2, 2023 13:15

stevendanna requested review from yuzefovich and adityamaru November 2, 2023 13:15

stevendanna mentioned this pull request Nov 2, 2023

server: option to wait for default_target_cluster on connect #113687

Merged

stevendanna added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Nov 2, 2023

yuzefovich approved these changes Nov 2, 2023

View reviewed changes

stevendanna force-pushed the wait-for-config-profiles branch from a4ebf87 to 0969935 Compare November 6, 2023 12:46

craig bot merged commit 56cf731 into cockroachdb:master Nov 6, 2023
3 checks passed

blathers-crl bot mentioned this pull request Nov 6, 2023

release-23.2: server: wait for active configuration profiles during startup #113883

Merged

blathers-crl bot mentioned this pull request Nov 7, 2023

release-23.2: server: option to wait for default_target_cluster on connect #113933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: wait for active configuration profiles during startup #113666

server: wait for active configuration profiles during startup #113666

stevendanna commented Nov 2, 2023 •

edited

Loading

cockroach-teamcity commented Nov 2, 2023

yuzefovich left a comment

stevendanna commented Nov 6, 2023

craig bot commented Nov 6, 2023

stevendanna commented Nov 6, 2023

craig bot commented Nov 6, 2023

server: wait for active configuration profiles during startup #113666

server: wait for active configuration profiles during startup #113666

Conversation

stevendanna commented Nov 2, 2023 • edited Loading

cockroach-teamcity commented Nov 2, 2023

yuzefovich left a comment

Choose a reason for hiding this comment

stevendanna commented Nov 6, 2023

craig bot commented Nov 6, 2023

stevendanna commented Nov 6, 2023

craig bot commented Nov 6, 2023

stevendanna commented Nov 2, 2023 •

edited

Loading