Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: recent regression in stopper quiescence time #47011

Closed
andreimatei opened this issue Apr 3, 2020 · 2 comments
Closed

sql: recent regression in stopper quiescence time #47011

andreimatei opened this issue Apr 3, 2020 · 2 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@andreimatei
Copy link
Contributor

#46832 seems to have somehow caused a regression for tests shutting down. Every now and then, shutting down a server seems to take ~15s, and the logs show:

W200403 19:36:34.837125 3038 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200403 19:36:36.577356 3038 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200403 19:36:41.068366 3038 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200403 19:36:49.157471 3038 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200403 19:36:49.157505 3038 sql/temporary_schema.go:551  [n1] failed to clean temp objects: node unavailable; try another peer
I200403 19:36:49.157515 3038 sql/temporary_schema.go:562  [n1] temporary object cleaner next scheduled to run at 2020-04-03 20:06:33.717927 +0000 UTC

I've bisected it pretty conclusively to that PR. I'm looking superficially at the PR though, and I can't tell what's wrong.
It seems to affect tests at random. The effect is big enough to cause the tests for the sql package to take a lot longer than they used to.

To repro, for example, you can do do

make testshort PKG='./pkg/sql' TESTFLAGS="-v --count=50" TESTS=TestSavepointMetric

One out of every 10 runs or so will be very slow.

The PR in question was backported to 20.1 too. I'm gonna mark it as a release blocker cause I find it pretty scary, but it might turn out to not be too bad.

@andreimatei andreimatei added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 3, 2020
@knz
Copy link
Contributor

knz commented Apr 3, 2020

oh yeah I had noticed that but so far I've found it only affects TestServers. I did not see it with cockroach start nodes. I'll investigate more.

@RaduBerinde
Copy link
Member

RaduBerinde commented Apr 6, 2020

This is the same with #47047 which has been fixed in #47048.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

4 participants