Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: temporary schema code causes slowdown in tests #47047

Closed
RaduBerinde opened this issue Apr 5, 2020 · 2 comments · Fixed by #47048
Closed

sql: temporary schema code causes slowdown in tests #47047

RaduBerinde opened this issue Apr 5, 2020 · 2 comments · Fixed by #47048
Assignees
Labels
S-3-productivity Severe issues that impede the productivity of CockroachDB developers.

Comments

@RaduBerinde
Copy link
Member

I noticed various simple tests (like TestChartCatalogGen) occasionally taking ~15 seconds. The logs suggest that we are waiting to stop the server and the temporary schema code keeps retrying:

I200405 00:49:55.632477 3931 util/stop/stopper.go:539  quiescing
W200405 00:49:55.632530 6061 kv/kvserver/intentresolver/intent_resolver.go:745  failed to gc transaction record: could not GC completed transaction anchored at /Table/SystemConfigSpan/Start: node unavailable; try another peer
E200405 00:49:55.632533 5034 kv/kvserver/queue.go:1089  [n1,raftlog,s1,r3/1:/System/{NodeLive…-tsd}] node unavailable; try another peer
W200405 00:49:55.632553 5404 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: failed to send RPC: sending to all 1 replicas failed; last error: (err: [NotLeaseHolderError] r26: replica (n1,s1):1 not lease holder; lease holder unknown) <nil>
W200405 00:49:56.604126 5404 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200405 00:49:58.822833 5404 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200405 00:50:02.607569 5404 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200405 00:50:10.180665 5404 sql/temporary_schema.go:430  [n1] error during schema cleanup, retrying: node unavailable; try another peer
W200405 00:50:10.180735 5404 sql/temporary_schema.go:551  [n1] failed to clean temp objects: node unavailable; try another peer
I200405 00:50:10.180765 5404 sql/temporary_schema.go:562  [n1] temporary object cleaner next scheduled to run at 2020-04-05 01:19:55.595077005 +0000 UTC

This happens frequently enough that when running tests for a big package, a handful of tests hit this, with a big impact in the overall test time.

Perhaps we should be checking stopper.ShouldQuiesce() before retrying.

@RaduBerinde RaduBerinde added the S-3-productivity Severe issues that impede the productivity of CockroachDB developers. label Apr 5, 2020
otan added a commit to otan-cockroach/cockroach that referenced this issue Apr 5, 2020
…aner

Resolves cockroachdb#47047.

This fixes a bug where the temporary object cleaner can hang during
object shutdown by not obeying stopper.ShouldQuiesce(). I've run
`TestChartCatalogGen` 10 times and confirmed it all takes the same
amount of time.

Release note: None
@otan
Copy link
Contributor

otan commented Apr 5, 2020

thought i wonder if it's #47011

@RaduBerinde
Copy link
Member Author

Ah, yeah, looks like the same thing.

craig bot pushed a commit that referenced this issue Apr 5, 2020
47048: sql: pass stopper.ShouldQuiesce() to retryFunc for TemporaryObjectCleaner r=RaduBerinde a=otan

Resolves #47047.

This fixes a bug where the temporary object cleaner can hang during
object shutdown by not obeying stopper.ShouldQuiesce(). I've run
`TestChartCatalogGen` 10 times and confirmed it all takes the same
amount of time.

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 6dd1562 Apr 5, 2020
otan added a commit to otan-cockroach/cockroach that referenced this issue Apr 6, 2020
…aner

Resolves cockroachdb#47047.

This fixes a bug where the temporary object cleaner can hang during
object shutdown by not obeying stopper.ShouldQuiesce(). I've run
`TestChartCatalogGen` 10 times and confirmed it all takes the same
amount of time.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-3-productivity Severe issues that impede the productivity of CockroachDB developers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants