-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time,runtime: too many concurrent timer firings for short time.Ticker
#69880
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
@gopherbot Please open a backport issue for Go 1.23. This is a serious issue that affects users with no reasonable workaround. This issues does not affect Go 1.22. |
Backport issue(s) opened: #69881 (for 1.22), #69882 (for 1.23). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Fixed on tip, we think, by https://go.dev/cl/620136. Will reopen if it continues to occur. |
Change https://go.dev/cl/620137 mentions this issue: |
Hi, my team has also observed occasional "too many concurrent timer firings" panics after upgrading to Go 1.23.2. In our case, the panics occur in a service that runs a gRPC server and my suspicion is that it may be because the gRPC server creates three new timers for each new connection: idleTimer := time.NewTimer(t.kp.MaxConnectionIdle)
ageTimer := time.NewTimer(t.kp.MaxConnectionAge)
kpTimer := time.NewTimer(t.kp.Time) In one case I saw that a replica that panicked had about 1000 active connections at the time it panicked. We initially suspected that this was a thundering herd problem in which too many clients try to connect to the service at once and thereby cause too many concurrent timers to be created, so we tried increasing the jitter the clients use when connecting to the service. Even with that change though we continued to observed panics in the service (I suspect this might be because the server resets the timers it creates so just due to random timing effects one might end up with many timers expiring at about the same time). I'm still working on a reproducible example that I can share, but was curious if the linked CL would also effect timers or if it was only relevant to tickers? |
CL 620136 will only affect Tickers, not Timers. Note that this crash is not the result of having many concurrent timers. It is the result of a single timer being run multiple times concurrently. Creating lots of timers can't cause this problem. What can cause this problem is having a single timer that is reset multiple times very quickly to times that are in the past or the very near future. Each such timer reset can potentially lead to an immediate run of the timer. If those timer runs somehow happens 8 times concurrently, this error can occur. That said, I haven't been able to recreate this problem using timers. A test case would certainly be helpful. |
Thank you, that's incredibly helpful context. I had misunderstood the panic message, but this gives me a lead I can use to find the problematic goroutine in the stack traces emitted after the panics. Once I find the issue I'll see if I can turn it into a simple reproducible example. |
So yesterday I also encountered this panic. It happened after 2-3 Hours and I found in my code the responsible code pattern:
I know this is a bad pattern, I wanted to test the new golang 1.23 feature, that automically GCs unreferenced unstopped tickers. I refactored this code pattern to use only one Ticker:
|
@matbits Thanks. I don't see how the original |
Good point, I thought the bug happened in my code and this is the only time I am using the time package with a ticker in my code. But maybe the panic happened inside of a third party library. I need to investigate this further |
Change https://go.dev/cl/621616 mentions this issue: |
This change switches isSending to be an atomic.Int32 instead of an atomic.Uint8. The Int32 version is managed as a counter, which is something that we couldn't do with Uint8 without adding a new intrinsic which may not be available on all architectures. That is, instead of only being able to support 8 concurrent timer firings on the same timer because we only have 8 independent bits to set for each concurrent timer firing, we can now have 2^31-1 concurrent timer firings before running into any issues. Like the fact that each bit-set was matched with a clear, here we match increments with decrements to indicate that we're in the "sending on a channel" critical section in the timer code, so we can report the correct result back on Stop or Reset. We choose an Int32 instead of a Uint32 because it's easier to check for obviously bad values (negative values are always bad) and 2^31-1 concurrent timer firings should be enough for anyone. Previously, we avoided anything bigger than a Uint8 because we could pack it into some padding in the runtime.timer struct. But it turns out that the type that actually matters, runtime.timeTimer, is exactly 96 bytes in size. This means its in the next size class up in the 112 byte size class because of an allocation header. We thus have some free space to work with. This change increases the size of this struct from 96 bytes to 104 bytes. (I'm not sure if runtime.timer is often allocated directly, but if it is, we get lucky in the same way too. It's exactly 80 bytes in size, which means its in the 96-byte size class, leaving us with some space to work with.) Fixes #69969. Related to #69880 and #69312. Change-Id: I9fd59cb6a69365c62971d1f225490a65c58f3e77 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/621616 Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
The Ticker Stop and Reset methods don't report a value, so we don't need to track whether they are interrupting a send. This includes a test that used to fail about 2% of the time on my laptop when run under x/tools/cmd/stress. For #69880 Fixes #69882 Change-Id: Ic6d14b344594149dd3c24b37bbe4e42e83f9a9ad Reviewed-on: https://go-review.googlesource.com/c/go/+/620136 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> (cherry picked from commit 48849e0) Reviewed-on: https://go-review.googlesource.com/c/go/+/620137 Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://go.dev/cl/621856 mentions this issue: |
This change switches isSending to be an atomic.Int32 instead of an atomic.Uint8. The Int32 version is managed as a counter, which is something that we couldn't do with Uint8 without adding a new intrinsic which may not be available on all architectures. That is, instead of only being able to support 8 concurrent timer firings on the same timer because we only have 8 independent bits to set for each concurrent timer firing, we can now have 2^31-1 concurrent timer firings before running into any issues. Like the fact that each bit-set was matched with a clear, here we match increments with decrements to indicate that we're in the "sending on a channel" critical section in the timer code, so we can report the correct result back on Stop or Reset. We choose an Int32 instead of a Uint32 because it's easier to check for obviously bad values (negative values are always bad) and 2^31-1 concurrent timer firings should be enough for anyone. Previously, we avoided anything bigger than a Uint8 because we could pack it into some padding in the runtime.timer struct. But it turns out that the type that actually matters, runtime.timeTimer, is exactly 96 bytes in size. This means its in the next size class up in the 112 byte size class because of an allocation header. We thus have some free space to work with. This change increases the size of this struct from 96 bytes to 104 bytes. (I'm not sure if runtime.timer is often allocated directly, but if it is, we get lucky in the same way too. It's exactly 80 bytes in size, which means its in the 96-byte size class, leaving us with some space to work with.) Fixes #69978 For #69969. Related to #69880 and #69312 and #69882. Change-Id: I9fd59cb6a69365c62971d1f225490a65c58f3e77 Cq-Include-Trybots: luci.golang.try:go1.23-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/621616 Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit 6a49f81) Reviewed-on: https://go-review.googlesource.com/c/go/+/621856 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
We've observed the "too many concurrent timer firings" panic, introduced in go.dev/cl/611496 for #69312, in production. We suspect they're due to relatively short
time.Ticker
s.This bug tracks figuring out why it happens and mitigating it. We expected this to be rare, and it still is relatively rare, but it does happen. go.dev/cl/620136 is a mitigation for
time.Ticker
that already landed on tip.CC @ianlancetaylor
The text was updated successfully, but these errors were encountered: