Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: only run tenants with service mode 'shared' #95658

Merged
merged 1 commit into from
Jan 28, 2023

Conversation

knz
Copy link
Contributor

@knz knz commented Jan 21, 2023

Fixes #92739.
Fixes #96146.

RFC for context: #96147

Previous PRs:

Prior to this patch, services for secondary tenants would be started
automatically upon first use by a client.

This commit changes this to auto-start services upfront for all
tenants with service mode SHARED. (And shut down services for tenants
with another service mode configured.)

Release note: None
Epic: CRDB-14537

@blathers-crl
Copy link

blathers-crl bot commented Jan 21, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20230121-tenant-status-lowercase branch from b2b2fac to cb7c7b9 Compare January 21, 2023 22:38
@knz knz force-pushed the 20220121-server-lifecycle branch from 7825bab to 87a11e6 Compare January 21, 2023 23:03
@knz knz force-pushed the 20230121-tenant-status-lowercase branch from cb7c7b9 to e1b76bd Compare January 23, 2023 15:11
@knz knz force-pushed the 20220121-server-lifecycle branch from 87a11e6 to ff07b82 Compare January 23, 2023 15:12
@knz knz changed the base branch from 20230121-tenant-status-lowercase to 20230123-better-tenant-state January 24, 2023 13:40
@knz knz force-pushed the 20220121-server-lifecycle branch from ff07b82 to d4daf0c Compare January 24, 2023 13:49
@knz knz marked this pull request as ready for review January 24, 2023 13:50
@knz knz requested review from a team, herkolategan, renatolabs and andreimatei and removed request for a team January 24, 2023 13:50
@knz knz added the A-multitenancy Related to multi-tenancy label Jan 24, 2023
@knz knz force-pushed the 20230123-better-tenant-state branch from d13c9bc to 91ccefe Compare January 24, 2023 14:07
@knz knz requested a review from a team as a code owner January 24, 2023 14:07
@knz knz requested a review from a team January 24, 2023 14:07
@knz knz requested review from a team as code owners January 24, 2023 14:07
@knz knz requested a review from a team January 24, 2023 14:07
@knz knz requested a review from a team as a code owner January 24, 2023 14:07
@knz knz requested a review from cucaroach January 24, 2023 14:07
@knz knz force-pushed the 20230123-better-tenant-state branch from 1a0014b to 52787d8 Compare January 25, 2023 22:51
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the second commit only.

Overall, I think the polling approach will work for now. Do we need to add a test to ensure that ensures that the STOP SERVICE command does tear down the server?

Comment on lines 261 to 272
if _, err := c.createServer(ctx, tenantName, base.TestSharedProcessTenantArgs{}); err != nil {
log.Warningf(ctx, "unable to start service for tenant: %v", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want any backoff for tenants that we can't start for some reason? Currently, we'll retry every second which seems a bit aggressive if we've gotten an error.


rows, err := ie.QueryBuffered(ctx, "fetch-running-tenants",
nil, /* txn */
`SELECT name FROM system.tenants WHERE service_mode = $1 AND name IS NOT NULL`, mtinfopb.ServiceModeShared)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we prevent it on write, but do we want an extra condition here that the data_mode is also in the correct state?

if _, ok := reqTenants[name]; !ok {
// Tenant currently running, but should not be running. Shut it down.
if err := c.stopper.RunAsyncTask(ctx, "stop-tenant-async", func(ctx context.Context) {
srv.server.stop(context.Background())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we pass context.Background() here so that the stop process isn't canceled when we start shutting down. If not, the real reason could be worth a comment.

Comment on lines 4234 to 4247
ie := s.internalExecutor
rowIter, err := ie.QueryIterator(ctx, "list-tenants", nil, /* txn */
`SELECT name FROM system.tenants WHERE service_mode = $1 AND name IS NOT NULL`, mtinfopb.ServiceModeShared)
if err != nil {
return nil, err
}
defer func() { _ = rowIter.Close() }()

var hasNext bool
for hasNext, err = rowIter.Next(ctx); hasNext && err == nil; hasNext, err = rowIter.Next(ctx) {
row := rowIter.Cur()
tenantName := tree.MustBeDString(row[0])
tenantNames = append(tenantNames, roachpb.TenantName(tenantName))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see extracting this to a small function.

row := rowIter.Cur()
tenantName := tree.MustBeDString(row[0])
tenantNames = append(tenantNames, roachpb.TenantName(tenantName))
}
}

var tenantList []*serverpb.Tenant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could pre-allocate this since in the common case the tenantList will be the same length as tenantNames.

Comment on lines +4225 to +4308
// ListTenants returns a list of tenants that are served
// by shared-process services in this server.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that existing callers of this API are OK with this change?

@knz knz force-pushed the 20230123-better-tenant-state branch from 52787d8 to 5fcce46 Compare January 26, 2023 13:57
@knz knz force-pushed the 20220121-server-lifecycle branch from 2e5316d to 273dc19 Compare January 26, 2023 14:50
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a test to ensure that ensures that the STOP SERVICE command does tear down the server?

Absolutely! I had forgotten. Good catch. Done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @herkolategan, @renatolabs, @rhu713, and @stevendanna)


pkg/server/admin.go line 4226 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I am assuming that existing callers of this API are OK with this change?

Yes, we chatted about it together.


pkg/server/admin.go line 4247 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I could see extracting this to a small function.

Good idea. Done.


pkg/server/admin.go line 4250 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I suppose we could pre-allocate this since in the common case the tenantList will be the same length as tenantNames.

Done.


pkg/server/server_controller.go line 247 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I assume we pass context.Background() here so that the stop process isn't canceled when we start shutting down. If not, the real reason could be worth a comment.

Indeed - still added a comment to clarify.


pkg/server/server_controller.go line 263 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Do we want any backoff for tenants that we can't start for some reason? Currently, we'll retry every second which seems a bit aggressive if we've gotten an error.

OK. I'll do that next.


pkg/server/server_controller.go line 288 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I know we prevent it on write, but do we want an extra condition here that the data_mode is also in the correct state?

Let's do it. Done.

@knz knz force-pushed the 20220121-server-lifecycle branch from 273dc19 to 67c573c Compare January 26, 2023 23:43
@knz knz requested review from a team as code owners January 26, 2023 23:43
@knz knz requested a review from a team January 26, 2023 23:43
@knz knz requested a review from a team as a code owner January 26, 2023 23:43
@knz knz requested review from a team and bananabrick January 26, 2023 23:43
@knz knz changed the base branch from 20230123-better-tenant-state to master January 26, 2023 23:43
@knz knz force-pushed the 20220121-server-lifecycle branch 2 times, most recently from bcee01c to 2f56e7a Compare January 28, 2023 10:35
Prior to this patch, services for secondary tenants would be started
automatically upon first use by a client.

This commit changes this to auto-start services upfront for all
tenants with service mode SHARED. (And shut down services for tenants
with another service mode configured.)

Release note: None
@knz knz force-pushed the 20220121-server-lifecycle branch from 2f56e7a to 3ff9bc9 Compare January 28, 2023 11:10
@knz
Copy link
Contributor Author

knz commented Jan 28, 2023

TFYR!

bors r=stevendanna

@craig
Copy link
Contributor

craig bot commented Jan 28, 2023

Build failed:

@knz
Copy link
Contributor Author

knz commented Jan 28, 2023

unrelated flake #96136

bors r=stevendanna

@craig
Copy link
Contributor

craig bot commented Jan 28, 2023

Build succeeded:

@craig craig bot merged commit 69dd453 into master Jan 28, 2023
@craig craig bot deleted the 20220121-server-lifecycle branch January 28, 2023 13:26
craig bot pushed a commit that referenced this pull request Jan 30, 2023
96143: bench: adjust the secondary tenant initialization r=yuzefovich a=knz

With the changes in #95658 this can now be simplified.

Release note: None
Epic: CRDB-14537

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
3 participants