Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: flush SQL stats during drain #76397

Merged
merged 1 commit into from
Feb 17, 2022

Conversation

cameronnunez
Copy link
Contributor

@cameronnunez cameronnunez commented Feb 10, 2022

Fixes #72045.
Fixes #74413.

Previously, SQL stats would be lost when a node drains. Now a drain
triggers a flush of the SQL stats into the statement statistics
system table while the SQL layer is being drained.

Release note (cli change): a drain of node now ensures that
SQL statistics are not lost during the process; they are now
preserved in the statement statistics system table.

@cameronnunez cameronnunez requested a review from a team as a code owner February 10, 2022 21:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cameronnunez cameronnunez marked this pull request as draft February 10, 2022 21:15
@cameronnunez cameronnunez force-pushed the flush-sql-stats-drain branch 4 times, most recently from 69058a5 to e3b2a88 Compare February 10, 2022 21:34
@cameronnunez cameronnunez changed the title server: flush SQL stats during drain [WIP] server: flush SQL stats during drain Feb 10, 2022
@cameronnunez cameronnunez force-pushed the flush-sql-stats-drain branch 2 times, most recently from 2482acf to 9bcb617 Compare February 14, 2022 20:10
@cameronnunez cameronnunez marked this pull request as ready for review February 14, 2022 20:13
@cameronnunez cameronnunez changed the title [WIP] server: flush SQL stats during drain server: flush SQL stats during drain Feb 14, 2022
@cameronnunez cameronnunez force-pushed the flush-sql-stats-drain branch 2 times, most recently from 5cd3e7c to dd808f9 Compare February 15, 2022 14:19
@cameronnunez cameronnunez requested review from Azhng and knz February 15, 2022 14:22
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I have a quick question. This PR seems like a very explicit fix for #74413. So is the plan to have a more general solution for #72045 eventually?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)


pkg/server/drain_test.go, line 143 at r1 (raw file):

	// Check that in-memory data was flushed into system tables during the drain.
	// Verify that the statement stats are in the reported stats pool.
	stats, err = ts.GetScrubbedReportingStats(ctx)

the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats() doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.

@cameronnunez
Copy link
Contributor Author

@Azhng thanks for the review! I was actually wondering are there other use cases for which a more general solution is necessary?

@cameronnunez cameronnunez force-pushed the flush-sql-stats-drain branch 6 times, most recently from 7d0b350 to 9e0dc93 Compare February 15, 2022 21:06
@cameronnunez cameronnunez requested a review from Azhng February 15, 2022 21:07
@cameronnunez
Copy link
Contributor Author

RFAL

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. As of now, within SQL Observability, SQL Stats is the only subsystem that require this hook. Though as we build up our observability stack, we will have more similar subsystems in the future that requires this type of hook.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)


pkg/server/drain_test.go, line 140 at r2 (raw file):

	// Issue a drain.
	drainCtx.sendDrainNoShutdown()

nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.


pkg/server/drain_test.go, line 149 at r2 (raw file):

	require.NotEqualf(t,
		func() int {
			q := sqlDB.Query(t, `SELECT count(*) FROM system.statement_statistics WHERE node_id = 1`)

nit: sqlDB.CheckQueryResults can come in pretty handy here :)

Copy link
Contributor Author

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha yeah I think it makes sense to eventually get to the general solution then.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @knz)


pkg/server/drain_test.go, line 143 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats() doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.

Done.


pkg/server/drain_test.go, line 140 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.

Done.


pkg/server/drain_test.go, line 149 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: sqlDB.CheckQueryResults can come in pretty handy here :)

Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?

@cameronnunez cameronnunez requested a review from Azhng February 16, 2022 21:32
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz)


pkg/server/drain_test.go, line 149 at r2 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?

Hmm how about SELECT count(*) > 0 FROM ... and then just assert the output is true in the CheckQueryResults ?

Previously, SQL stats would be lost when a node drains. Now a drain
triggers a flush of the SQL stats into the statement statistics
system table while the SQL layer is being drained.

Release note (cli change): a drain of node now ensures that
SQL statistics are not lost during the process; they are now
preserved in the statement statistics system table.
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: simple, elegant, wow, amaze 🐶

Reviewed 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez)

@cameronnunez
Copy link
Contributor Author

TFYRs!

bors r=knz,Azhng

@craig
Copy link
Contributor

craig bot commented Feb 17, 2022

Build succeeded:

@craig craig bot merged commit f2a722f into cockroachdb:master Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants