-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stmtdiagnostics: seemingly a deadlock in spanlatch.(*Manager).wait #119593
Labels
Comments
yuzefovich
added
C-test-failure
Broken test (automatically or manually discovered).
T-sql-queries
SQL Queries Team
labels
Feb 23, 2024
We've seen this previously #118524 (comment). |
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 5, 2024
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 5, 2024
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 5, 2024
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
craig bot
pushed a commit
that referenced
this issue
Mar 6, 2024
119901: timeutil: stack-allocate Timer, remove pooling r=nvanbenschoten a=nvanbenschoten Informs #119593. Closes #38055. This PR removes the pooling of `timeutil.Timer` structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (#61373 and #119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` ---- The PR then improves the memory pooling of the inner `time.Timer` so that it is always recycled. This was originally identified by `@andreimatei` in #13466 (review). Doing so has a positive impact on the microbenchmark introduced in the first commit, demonstrating that timers can be stack-allocated and require zero heap allocations: ``` name old time/op new time/op delta Timer-10 152µs ± 1% 153µs ± 1% ~ (p=0.133 n=9+10) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) ``` ---- cc. `@andreimatei` `@ajwerner` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Extracted from this failure of
TestDiagnosticsRequest
:The test failed during the server shutdown. To me it seems like a deadlock - the timer in question uses 15 seconds.
I think we have a problematic pattern of using
timeutil.Timer
at least instmtdiagnostics.Registry.poll
- namely we allocate the timer locally but then callingStop
on it puts it into thesync.Pool
, yet we later will keep on using the same timer which can result in concurrent access to the same timer object, which is invalid.Jira issue: CRDB-36234
The text was updated successfully, but these errors were encountered: