-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeutil: prevent illegal sharing of Timers #119595
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice debugging! The fix in the second commit here . I left a question about the first patch though.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/util/timeutil/timer.go
line 110 at r1 (raw file):
} } if !t.fromPool {
I don't entirely follow why this provides protection from misuse if the contract is still that "a Timer must never be used again after calls to Stop". Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using NewTimer
) can now be re-used after Stop is called? In other words, is this allowing us to change the contract? Or just making a common misuse of the contract less impactful, with the assumption that stack allocation is often paired with violating this contract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @nvanbenschoten)
pkg/util/timeutil/timer.go
line 110 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't entirely follow why this provides protection from misuse if the contract is still that "a Timer must never be used again after calls to Stop". Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using
NewTimer
) can now be re-used after Stop is called? In other words, is this allowing us to change the contract? Or just making a common misuse of the contract less impactful, with the assumption that stack allocation is often paired with violating this contract?
This is a defensive mechanism that prevents problems from one class of misuse (like the one the second commit fixes). In particular, now with the "stack-allocated" (that, indeed, should actually escape) timer it won't ever be possible to share the timer by mistake. It is still possible for this problem to occur if the timer is created via NewTimer
and the contract of Stop
not being honored.
I'm somewhat on the fence about the first commit, but I'm leaning towards being defensive and merging it due to:
- if the timer originally didn't come from the pool, then it seems somewhat strange to put it into the pool on
Stop
- this mechanism does protect from misuse - with low overhead - at least in some cases, and even very experienced engineers can make mistakes like this (the bug fixed in the second commit was introduced by Andrew W in 4ef67a0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)
-- commits
line 15 at r1:
nit: remove one of the two "only"'s
pkg/util/timeutil/timer.go
line 110 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This is a defensive mechanism that prevents problems from one class of misuse (like the one the second commit fixes). In particular, now with the "stack-allocated" (that, indeed, should actually escape) timer it won't ever be possible to share the timer by mistake. It is still possible for this problem to occur if the timer is created via
NewTimer
and the contract ofStop
not being honored.I'm somewhat on the fence about the first commit, but I'm leaning towards being defensive and merging it due to:
- if the timer originally didn't come from the pool, then it seems somewhat strange to put it into the pool on
Stop
- this mechanism does protect from misuse - with low overhead - at least in some cases, and even very experienced engineers can make mistakes like this (the bug fixed in the second commit was introduced by Andrew W in 4ef67a0).
Is there any reasonable use case for a stack allocated timer? If not, why not prevent stack allocated timers by making Timer
a point to an internal type struct:
type Timer = *t
type t struct {
timer *time.Timer
// C is a local "copy" of timer.C that can be used in a select case before
// the timer has been initialized (via Reset).
C <-chan time.Time
Read bool
}
Then trying to use an uninitialized var t Timer
will result in a NPE, which should force all timers to be allocated via the pool.
Currently, whenever `Timer.Stop` is called, we unconditionally put that timer into `timerPool`. This works well assuming that the contract of `Stop` is satisfied - namely that the stopped timer can no longer be used. However, we have at least one case where this contract is violated (fixed in the following commit), and in such a scenario it is possible for multiple users to access the same timer object which can lead to an undefined behavior (we've seen a deadlock on a test server shutdown). This commit hardens the timer code to prevent a class of such issues by putting the timer into the pool on `Stop` only if the timer originally came from the pool. Release note: None
The contract of `timeutil.Timer.Stop` is such that the timer can no longer be used, and it was violated in `Registry.poll`. This is now fixed by allocating a fresh timer after each `Stop` call. Release note: None
a2245e2
to
196dfd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @nvanbenschoten)
pkg/util/timeutil/timer.go
line 110 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is there any reasonable use case for a stack allocated timer? If not, why not prevent stack allocated timers by making
Timer
a point to an internal type struct:type Timer = *t type t struct { timer *time.Timer // C is a local "copy" of timer.C that can be used in a select case before // the timer has been initialized (via Reset). C <-chan time.Time Read bool }Then trying to use an uninitialized
var t Timer
will result in a NPE, which should force all timers to be allocated via the pool.
On a quick search there are like 20 places where the timers are allocated on the "stack" and don't use the pool. For example, replicaScanner
from server/scanner.go
- it seems reasonable to me that we embed the timer into the parent struct to avoid accessing timerPool
(although, later, on Reset
we'll still access the shared timeTimerPool
). To me this seems like a reasonable use case since we avoid an extra pointer to track / extra access to timerPool
- WDYT?
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I see, makes sense. No need to disallow stack allocation here then. |
TFTRs! bors r+ |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/util/timeutil/timer.go
line 110 at r1 (raw file):
Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using NewTimer) can now be re-used after Stop is called? In other words, is this allowing us to change the contract?
So can the contact be relaxed now for timers that didn't come from the pool? Seems like code actually does want to reuse timers after Stop
.
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
119901: timeutil: stack-allocate Timer, remove pooling r=nvanbenschoten a=nvanbenschoten Informs #119593. Closes #38055. This PR removes the pooling of `timeutil.Timer` structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (#61373 and #119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` ---- The PR then improves the memory pooling of the inner `time.Timer` so that it is always recycled. This was originally identified by `@andreimatei` in #13466 (review). Doing so has a positive impact on the microbenchmark introduced in the first commit, demonstrating that timers can be stack-allocated and require zero heap allocations: ``` name old time/op new time/op delta Timer-10 152µs ± 1% 153µs ± 1% ~ (p=0.133 n=9+10) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) ``` ---- cc. `@andreimatei` `@ajwerner` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
timeutil: prevent illegal sharing of Timers
Currently, whenever
Timer.Stop
is called, we unconditionally put that timer intotimerPool
. This works well assuming that the contract ofStop
is satisfied - namely that the stopped timer can no longer be used.However, we have at least one case where this contract is violated (fixed in the following commit), and in such a scenario it is possible for multiple users to access the same timer object which can lead to an undefined behavior (we've seen a deadlock on a test server shutdown).
This commit hardens the timer code to prevent a class of such issues by putting the timer into the pool on
Stop
only if the timer originally came from the pool.Release note: None
stmtdiagnostics: fix incorrect usage of timeutil.Timer
The contract of
timeutil.Timer.Stop
is such that the timer can no longer be used, and it was violated inRegistry.poll
. This now fixed by allocating a fresh timer after eachStop
call.Fixes: #119593.
Release note: None