-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: fix bug where SQL stats were not flushed into reportable stats #49392
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm though I’m not really in the sql code enough these days for my assessment to be very well informed
Ok -- @knz do you mind taking a quick look? You reviewed the initial PR where I performed this separation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a unit test perhaps?
Do we have a preferred way of testing this sort of background loop? Not sure if injecting some stats knobs here is the best way to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember you can override the target URL of telemetry reports using the env var (or, optionally, a test override).
You can use that to redirect the requests to a HTTP server ran by your test code. The test HTTP server leaves its incoming requests hanging or returns errors (or both), so that the telemetry reporter reliably hangs/fails.
Then you can configure the reporting interval to be short, so that the test doesn't have to wait an hour to get a guaranteed HTTP request.
then you can make the test wait until the first or second iteration (you'll know it's happening, because the http server will see an incoming request)
then you can check that the reported stats are properly cleared
A usual, you can write the test, apply it to the master
branch to verify the test fails, then check in your branch that it succeeds (i.e. you've written an actual regression test).
21233cc
to
7ca7716
Compare
Updated with a unit test (caught a sneaky race along the way). |
lesson learned: tests are hard to write but they're damn useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rohany)
pkg/server/stats_test.go, line 79 at r1 (raw file):
func TestEnsureSQLStatsAreFlushedForTelemetry(t *testing.T) { defer leaktest.AfterTest(t)()
nit: empty line after the defer (for consistency after other tests)
pkg/server/stats_test.go, line 118 at r1 (raw file):
return errors.New("expected to find query stats, but didn't") } return nil
This test merely checks that the statement makes it to telemetry - where is the test that the stats get reset?
Touches cockroachdb#49388. This was caused by the background loops use an incorrect setup to reset collected stats. Also, fix a bug where statements issued at the same time that the SQL stats are flushed would not be reported. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/stats_test.go, line 118 at r1 (raw file):
Previously, knz (kena) wrote…
This test merely checks that the statement makes it to telemetry - where is the test that the stats get reset?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust that you ran the new test without the bug fix, and found the test failing as expected?
LGTM further than that
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained
This added test isn't going to fail on master (but it is good to have anyway here). The problem wasn't that stats weren't getting cleared, its that they weren't getting flushed into the reporting pool in the process of clearing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... why did you not add a test for that then?
Reviewable status: complete! 0 of 0 LGTMs obtained
I'm confused, thats what this is testing. |
Sorry, I completely misunderstood your point. Yes, this whole test does not pass on master. What i meant was that the small addition I made (to test that stats were cleared) was still going to work on master. |
yay! carry on |
TFTR! bors r=knz |
Build succeeded |
Touches #49388.
This was caused by the background loops use an incorrect setup to reset
collected stats.
Also, fix a bug where statements issued at the same time that the SQL
stats are flushed would not be reported.
Release note: None