-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachperf: regression in tpcc on 2023/04/11 #101494
Comments
cc @cockroachdb/test-eng |
Below are the results of an automated bisection of Average throughput,
Bisection log,
The culprit appears to be bdf3f62 and the suspect is the added mutex [1] to fix the race condition in |
The mutex which was added to fix the race condition is not on a hot-path; it accounts for ~1% of CPU, The actual culprit turned out to be another mutex which happens to be on a hot-path,
Additionally, cf. before the change, |
Note, there are observable runtime differences between pgx v4 and v5. Below runtime plots corresponds to the above two runs; left side is pgx v4, right side is pgx v5. We can see that v5 is using slightly more memory. Also, the number of goroutines is more stable in v4, where it's nearly a flat line; cf., v4 where the number of goroutines oscillate roughly between 360 and 390. NOTE: Owing to the concurrency setting (3 nodes x 64), there are |
With respect to the number of goroutines being unstable in v5, I did some profiling by sampling all goroutines at different times,
The root cause is |
sync.RWMutex RLock() became a hot path for essentially a read-only value. The execution mode of a pool and query doesn't change after program initialization. Release note: None Fixes: cockroachdb#101494
Our performance dropped on
tpccbench/nodes=3/cpu=16
across aws and gcp on 2023/04/11. We should understand why and resolve the regression.Jira issue: CRDB-26971
The text was updated successfully, but these errors were encountered: