Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: fix bug where SQL stats were not flushed into reportable stats #49392

Merged
merged 1 commit into from
May 28, 2020

Conversation

rohany
Copy link
Contributor

@rohany rohany commented May 21, 2020

Touches #49388.

This was caused by the background loops use an incorrect setup to reset
collected stats.

Also, fix a bug where statements issued at the same time that the SQL
stats are flushed would not be reported.

Release note: None

@rohany rohany requested a review from dt May 21, 2020 19:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm though I’m not really in the sql code enough these days for my assessment to be very well informed

@rohany
Copy link
Contributor Author

rohany commented May 27, 2020

Ok -- @knz do you mind taking a quick look? You reviewed the initial PR where I performed this separation.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a unit test perhaps?

@rohany
Copy link
Contributor Author

rohany commented May 27, 2020

Do we have a preferred way of testing this sort of background loop? Not sure if injecting some stats knobs here is the best way to go.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember you can override the target URL of telemetry reports using the env var (or, optionally, a test override).

You can use that to redirect the requests to a HTTP server ran by your test code. The test HTTP server leaves its incoming requests hanging or returns errors (or both), so that the telemetry reporter reliably hangs/fails.

Then you can configure the reporting interval to be short, so that the test doesn't have to wait an hour to get a guaranteed HTTP request.

then you can make the test wait until the first or second iteration (you'll know it's happening, because the http server will see an incoming request)

then you can check that the reported stats are properly cleared

A usual, you can write the test, apply it to the master branch to verify the test fails, then check in your branch that it succeeds (i.e. you've written an actual regression test).

@rohany rohany force-pushed the stats-fic branch 4 times, most recently from 21233cc to 7ca7716 Compare May 28, 2020 02:59
@rohany
Copy link
Contributor Author

rohany commented May 28, 2020

Updated with a unit test (caught a sneaky race along the way).

@knz
Copy link
Contributor

knz commented May 28, 2020

caught a sneaky race along the way

lesson learned: tests are hard to write but they're damn useful

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rohany)


pkg/server/stats_test.go, line 79 at r1 (raw file):

func TestEnsureSQLStatsAreFlushedForTelemetry(t *testing.T) {
	defer leaktest.AfterTest(t)()

nit: empty line after the defer (for consistency after other tests)


pkg/server/stats_test.go, line 118 at r1 (raw file):

			return errors.New("expected to find query stats, but didn't")
		}
		return nil

This test merely checks that the statement makes it to telemetry - where is the test that the stats get reset?

Touches cockroachdb#49388.

This was caused by the background loops use an incorrect setup to reset
collected stats.

Also, fix a bug where statements issued at the same time that the SQL
stats are flushed would not be reported.

Release note: None
Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/stats_test.go, line 118 at r1 (raw file):

Previously, knz (kena) wrote…

This test merely checks that the statement makes it to telemetry - where is the test that the stats get reset?

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that you ran the new test without the bug fix, and found the test failing as expected?

LGTM further than that

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@rohany
Copy link
Contributor Author

rohany commented May 28, 2020

This added test isn't going to fail on master (but it is good to have anyway here). The problem wasn't that stats weren't getting cleared, its that they weren't getting flushed into the reporting pool in the process of clearing.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... why did you not add a test for that then?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@rohany
Copy link
Contributor Author

rohany commented May 28, 2020

I'm confused, thats what this is testing.

@rohany
Copy link
Contributor Author

rohany commented May 28, 2020

Sorry, I completely misunderstood your point. Yes, this whole test does not pass on master. What i meant was that the small addition I made (to test that stats were cleared) was still going to work on master.

@knz
Copy link
Contributor

knz commented May 28, 2020

yay!

carry on

@rohany
Copy link
Contributor Author

rohany commented May 28, 2020

TFTR!

bors r=knz

@craig
Copy link
Contributor

craig bot commented May 28, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants