-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: flush SQL stats during drain #76397
Conversation
69058a5
to
e3b2a88
Compare
2482acf
to
9bcb617
Compare
5cd3e7c
to
dd808f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I have a quick question. This PR seems like a very explicit fix for #74413. So is the plan to have a more general solution for #72045 eventually?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/server/drain_test.go, line 143 at r1 (raw file):
// Check that in-memory data was flushed into system tables during the drain. // Verify that the statement stats are in the reported stats pool. stats, err = ts.GetScrubbedReportingStats(ctx)
the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats()
doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.
@Azhng thanks for the review! I was actually wondering are there other use cases for which a more general solution is necessary? |
7d0b350
to
9e0dc93
Compare
RFAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. As of now, within SQL Observability, SQL Stats is the only subsystem that require this hook. Though as we build up our observability stack, we will have more similar subsystems in the future that requires this type of hook.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/server/drain_test.go, line 140 at r2 (raw file):
// Issue a drain. drainCtx.sendDrainNoShutdown()
nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.
pkg/server/drain_test.go, line 149 at r2 (raw file):
require.NotEqualf(t, func() int { q := sqlDB.Query(t, `SELECT count(*) FROM system.statement_statistics WHERE node_id = 1`)
nit: sqlDB.CheckQueryResults
can come in pretty handy here :)
9e0dc93
to
e5e1609
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha yeah I think it makes sense to eventually get to the general solution then.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @knz)
pkg/server/drain_test.go, line 143 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
the reporting stats pool is an in-memory pool, so this means
GetScrubbedReportingStats()
doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.
Done.
pkg/server/drain_test.go, line 140 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.
Done.
pkg/server/drain_test.go, line 149 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit:
sqlDB.CheckQueryResults
can come in pretty handy here :)
Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?
e5e1609
to
b4036e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/server/drain_test.go, line 149 at r2 (raw file):
Previously, cameronnunez (Cameron Nunez) wrote…
Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?
Hmm how about SELECT count(*) > 0 FROM ...
and then just assert the output is true
in the CheckQueryResults
?
Previously, SQL stats would be lost when a node drains. Now a drain triggers a flush of the SQL stats into the statement statistics system table while the SQL layer is being drained. Release note (cli change): a drain of node now ensures that SQL statistics are not lost during the process; they are now preserved in the statement statistics system table.
b4036e5
to
56ff1ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez)
TFYRs! bors r=knz,Azhng |
Build succeeded: |
Fixes #72045.
Fixes #74413.
Previously, SQL stats would be lost when a node drains. Now a drain
triggers a flush of the SQL stats into the statement statistics
system table while the SQL layer is being drained.
Release note (cli change): a drain of node now ensures that
SQL statistics are not lost during the process; they are now
preserved in the statement statistics system table.