Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: clean up sql session and instance on shutdown #99941

Merged

Conversation

jeffswenson
Copy link
Collaborator

@jeffswenson jeffswenson commented Mar 29, 2023

Remove the sql server's system.sql_instance and sqlliveness.sqlliveness
rows on sql server shut down. The original motivation for this change is
it improves the behavior of DistSQL for serverless tenants. SQL servers
would attempt to schedule DistSQL flows on sql_instance rows that
belonged to shutdown servers.

Cleaning up the session and instance have a hand full of small benefits.

  • The sql_instances pre-allocated in a region are less likely to run out
    and trigger the slow cold start path.
  • Resources leased by the session, like jobs, may be re-claimed more
    quickly after the server shuts down.

Fixes: CC-9095

Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)


pkg/server/drain.go line 381 at r1 (raw file):

	s.sqlServer.leaseMgr.SetDraining(ctx, true /* drain */, reporter)

	session, err := s.sqlServer.sqlLivenessProvider.Release(ctx)

Note: these two method calls (this one and the one below) need to be idempotent -- the Drain call is performed multiple times until it doesn't fail any more.

@jeffswenson jeffswenson force-pushed the jeffswenson-release-sql-instance branch from 10b7e56 to 36a5322 Compare April 3, 2023 16:30
craig bot pushed a commit that referenced this pull request Apr 3, 2023
98459: server,autoconfig: automatic configuration via config tasks r=adityamaru a=knz

Epic: CRDB-23559
Informs #98431.
All commits but the last are from #98993.

This change introduces "auto config tasks", a mechanism through which
configuration payloads ("tasks") can be injected into a running SQL
service.

This is driven via the "auto config runner" job that was introduced in
the previous commit. The job listens for the arrival of new task
definitions via a `Provider` interface. When new tasks are known, and
previous tasks have completed, the runner creates a job for the first
next task.

Release note: None

100476: server/drain: shut down SQL subsystems gracefully before releasing table leases r=JeffSwenson,rytaft a=knz

Needed for #99941 and #99958.
Epic: CRDB-23559

See individual commits for details.

100511: sqlccl: deflake TestGCTenantJobWaitsForProtectedTimestamps r=adityamaru,arulajmani a=knz

Fixes #94808

The tenant server must be shut down before the tenant record is removed; otherwise the tenant's SQL server will self-terminate by calling Stop() on its stopper, which in this case was shared with the parent cluster.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@jeffswenson jeffswenson force-pushed the jeffswenson-release-sql-instance branch 3 times, most recently from 8d7e00d to ab2a286 Compare April 3, 2023 23:14
Copy link
Collaborator Author

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/drain.go line 381 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Note: these two method calls (this one and the one below) need to be idempotent -- the Drain call is performed multiple times until it doesn't fail any more.

Done.

@jeffswenson jeffswenson changed the title Jeffswenson release sql instance server: clean up sql session and instance on shutdown Apr 3, 2023
@jeffswenson jeffswenson marked this pull request as ready for review April 3, 2023 23:18
@jeffswenson jeffswenson requested review from a team as code owners April 3, 2023 23:18
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)


pkg/sql/sqlliveness/slinstance/slinstance.go line 319 at r2 (raw file):

	select {
	case <-l.drain:
		log.Infof(ctx, "draining heartbeat loop")

Why not close l.mu.blockCh in this case?


pkg/sql/sqlliveness/slinstance/slinstance.go line 437 at r2 (raw file):

	l.wait.Add(1)
	err := l.stopper.RunAsyncTask(taskCtx, "slinstance", func(ctx context.Context) {

add defer l.wait.Done() at the beginning which is more idiomatic than Add(-1).


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 192 at r2 (raw file):

		}

		// Release an instance and verify the instance is available

nit: period at end.


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 196 at r2 (raw file):

			toRelease := initialInstances[0]

			// Call twice to ensure it is idempotent

ditto


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 213 at r2 (raw file):

			}

			// re-allocate the instance

nit: capital at start and period at end.


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 240 at r2 (raw file):

		}

		// TODO(jeffswenson): verify release is idempotent

I believe you've done this elsewhere already. Maybe remove this comment.


pkg/sql/sqlliveness/slinstance/slinstance_test.go line 132 at r2 (raw file):

	require.NotEqual(t, 0, stopper.NumTasks())

	// Make sure release is idempotent

nit: period at end. Ditto below.

@jeffswenson jeffswenson force-pushed the jeffswenson-release-sql-instance branch 2 times, most recently from 6507651 to f8f7733 Compare April 4, 2023 18:58
Copy link
Collaborator Author

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/sql/sqlliveness/slinstance/slinstance.go line 319 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Why not close l.mu.blockCh in this case?

A side effect of clearSessionLocked is it allocates a new blockCh. blockCh is expected to always be nil when the server is draining. If it were not nil, then clearSessionLocked would panic.

I added a comment describing where the blockCh is coming from. The blockCh is used by the SqlInstance.Session method, so I updated the test to ensure the Session method does the right thing after the session is released.


pkg/sql/sqlliveness/slinstance/slinstance.go line 437 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

add defer l.wait.Done() at the beginning which is more idiomatic than Add(-1).

Done.


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 196 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

ditto

Done.


pkg/sql/sqlinstance/instancestorage/instancestorage_test.go line 240 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I believe you've done this elsewhere already. Maybe remove this comment.

Done.

@jeffswenson jeffswenson force-pushed the jeffswenson-release-sql-instance branch from f8f7733 to 35b9f68 Compare April 4, 2023 19:49
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: very nice!

the remaining CI failure (maybe_stress) isn't related to your PR and shouldn't be a blocker.

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson)

Remove the sql server's system.sql_instance and sqlliveness.sqlliveness
rows on sql server shut down. The original motivation for this change is
it improves the behavior of DistSQL for serverless tenants. SQL servers
would attempt to schedule DistSQL flows on sql_instance rows that
belonged to shutdown servers.

Cleaning up the session and instance have a hand full of small benefits.
- The sql_instances pre-allocated in a region are less likely to run out
  and trigger the slow cold start path.
- Resources leased by the session, like jobs, may be re-claimed more
  quickly after the server shuts down.

Fixes: CC-9095

Release note: none
@jeffswenson jeffswenson force-pushed the jeffswenson-release-sql-instance branch from 35b9f68 to 7cec096 Compare April 4, 2023 22:14
@jeffswenson
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 5, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants