-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCRUB index check hangs when run concurrently with TPCC #33173
Comments
the flow for the SCRUB query that got stuck:
there are some
|
Thanks for filing @lucy-zhang! @asubiotto could you take a look at this? Download the goroutine dump and have a look at some of the blocked threads. It looks suspiciously similar to some of the things you've been investigating recently - for example, some of the blocked threads are waiting on getting network quota from GRPC... |
I took a quick look at this and it seems that the @lucy-zhang, what is stopping us from planning a scrub in a distributed manner? |
@asubiotto the answer to your latter question is "because it was never implemented so far". |
This will get done automatically once the delete-local pr is in. |
34383: sql: delete local implementations of planNodes r=jordanlewis a=jordanlewis This PR deletes the remaining users of the planNode execution engine and deletes the duplicate implementations for those planNodes that have DistSQL equivalents. Closes #33173. Co-authored-by: Jordan Lewis <[email protected]>
The new SCRUB roachtests
scrub/{all-checks,index-only}/tpcc-1000
, which run a series of SCRUB checks on a cluster running TPCC at the same time, have been timing out because the SCRUB query hangs. See #33151, #33149.The TPCC queries themselves run successfully during this test, and the cluster is able to execute other queries when I ssh into one of the nodes. SCRUB also runs fine when run on the cluster with TPCC data with no other queries running, so it seems like the deadlock occurs when there's contention between TPCC queries and the SCRUB queries that need to do a full table scan. (I also tried running the roachtest with
AS OF SYSTEM TIME '-5s'
in the SCRUB query to reduce contention, which was successful. See #33152.)Goroutine dump: goroutines.zip
The text was updated successfully, but these errors were encountered: