-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: clean up sql session and instance on shutdown #99941
server: clean up sql session and instance on shutdown #99941
Conversation
908db30
to
10b7e56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)
pkg/server/drain.go
line 381 at r1 (raw file):
s.sqlServer.leaseMgr.SetDraining(ctx, true /* drain */, reporter) session, err := s.sqlServer.sqlLivenessProvider.Release(ctx)
Note: these two method calls (this one and the one below) need to be idempotent -- the Drain call is performed multiple times until it doesn't fail any more.
10b7e56
to
36a5322
Compare
98459: server,autoconfig: automatic configuration via config tasks r=adityamaru a=knz Epic: CRDB-23559 Informs #98431. All commits but the last are from #98993. This change introduces "auto config tasks", a mechanism through which configuration payloads ("tasks") can be injected into a running SQL service. This is driven via the "auto config runner" job that was introduced in the previous commit. The job listens for the arrival of new task definitions via a `Provider` interface. When new tasks are known, and previous tasks have completed, the runner creates a job for the first next task. Release note: None 100476: server/drain: shut down SQL subsystems gracefully before releasing table leases r=JeffSwenson,rytaft a=knz Needed for #99941 and #99958. Epic: CRDB-23559 See individual commits for details. 100511: sqlccl: deflake TestGCTenantJobWaitsForProtectedTimestamps r=adityamaru,arulajmani a=knz Fixes #94808 The tenant server must be shut down before the tenant record is removed; otherwise the tenant's SQL server will self-terminate by calling Stop() on its stopper, which in this case was shared with the parent cluster. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
8d7e00d
to
ab2a286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/drain.go
line 381 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Note: these two method calls (this one and the one below) need to be idempotent -- the Drain call is performed multiple times until it doesn't fail any more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)
pkg/sql/sqlliveness/slinstance/slinstance.go
line 319 at r2 (raw file):
select { case <-l.drain: log.Infof(ctx, "draining heartbeat loop")
Why not close l.mu.blockCh in this case?
pkg/sql/sqlliveness/slinstance/slinstance.go
line 437 at r2 (raw file):
l.wait.Add(1) err := l.stopper.RunAsyncTask(taskCtx, "slinstance", func(ctx context.Context) {
add defer l.wait.Done()
at the beginning which is more idiomatic than Add(-1)
.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 192 at r2 (raw file):
} // Release an instance and verify the instance is available
nit: period at end.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 196 at r2 (raw file):
toRelease := initialInstances[0] // Call twice to ensure it is idempotent
ditto
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 213 at r2 (raw file):
} // re-allocate the instance
nit: capital at start and period at end.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 240 at r2 (raw file):
} // TODO(jeffswenson): verify release is idempotent
I believe you've done this elsewhere already. Maybe remove this comment.
pkg/sql/sqlliveness/slinstance/slinstance_test.go
line 132 at r2 (raw file):
require.NotEqual(t, 0, stopper.NumTasks()) // Make sure release is idempotent
nit: period at end. Ditto below.
6507651
to
f8f7733
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/sqlliveness/slinstance/slinstance.go
line 319 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Why not close l.mu.blockCh in this case?
A side effect of clearSessionLocked
is it allocates a new blockCh
. blockCh
is expected to always be nil when the server is draining. If it were not nil, then clearSessionLocked would panic.
I added a comment describing where the blockCh
is coming from. The blockCh
is used by the SqlInstance.Session
method, so I updated the test to ensure the Session
method does the right thing after the session is released.
pkg/sql/sqlliveness/slinstance/slinstance.go
line 437 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
add
defer l.wait.Done()
at the beginning which is more idiomatic thanAdd(-1)
.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 196 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
ditto
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go
line 240 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I believe you've done this elsewhere already. Maybe remove this comment.
Done.
f8f7733
to
35b9f68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the remaining CI failure (maybe_stress) isn't related to your PR and shouldn't be a blocker.
Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson)
Remove the sql server's system.sql_instance and sqlliveness.sqlliveness rows on sql server shut down. The original motivation for this change is it improves the behavior of DistSQL for serverless tenants. SQL servers would attempt to schedule DistSQL flows on sql_instance rows that belonged to shutdown servers. Cleaning up the session and instance have a hand full of small benefits. - The sql_instances pre-allocated in a region are less likely to run out and trigger the slow cold start path. - Resources leased by the session, like jobs, may be re-claimed more quickly after the server shuts down. Fixes: CC-9095 Release note: none
35b9f68
to
7cec096
Compare
bors r+ |
Build succeeded: |
Remove the sql server's system.sql_instance and sqlliveness.sqlliveness
rows on sql server shut down. The original motivation for this change is
it improves the behavior of DistSQL for serverless tenants. SQL servers
would attempt to schedule DistSQL flows on sql_instance rows that
belonged to shutdown servers.
Cleaning up the session and instance have a hand full of small benefits.
and trigger the slow cold start path.
quickly after the server shuts down.
Fixes: CC-9095
Release note: none