-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: CTS.TryReset() concurrency issue #60182 #60224
Conversation
Tagging subscribers to this area: @mangod9 Issue DetailsPR contains a fix for #60182
|
tests for both cases which fail with current net60 will protect this fix from regressing |
@kasperk81 , I agree with you. However, I see no way to write a stable tests for that cases. During such tests, we must have a guarantee that |
Thanks, @sakno. I appreciate your putting a fix together. I'm not sure it's what we should do, though. Beyond adding complexity to the synchronization scheme, it's making IsCancellationRequested a bit more expensive, and while not a lot more, IsCancellationRequested is on a ton of hot paths; it's very likely that adding even a tad more expense there will be noticed in some benchmarks and potentially even real-world use. In your bug report, you said there's no workaround. But, isn't the only affect of this race condition that, in rare conditions, TryReset throws an ObjectDisposedException instead of returning false? Assuming yes, this points to both a workaround (someone can catch the exception) and a fix (we can either catch or avoid the exception). Note that CancellationTokenSource already wraps timer.Change in another place with a try/catch(ObjectDisposedException): runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Lines 355 to 365 in 4b84451
so while I don't love first-chance exceptions like this, I think it'd be fine for the entirety of this fix to simply be wrapping this additional location with a similar try/catch. That would also seem to be the least-risky fix that would enable us porting back to release/6.0 (you opened this PR against release/6.0-rc2, but our standard practice is to first get it fixed in main and then decide if/what/when to backport). We could subsequently look at avoiding the exception altogether; we're already depending on the internal TimerQueueTimer, and we could for example update its Change method to take an additional bool argument indicating whether to throw or return false in the face of cancellation/disposal; that, however, should be something we consider separately from fixing this. |
@BrzVlad , @EgorBo , @eiriktsarpalis , @imhameed , @lambdageek , @lateralusX , @layomia , @marek-safar , @naricc , @SamMonoRT , @steveharter , @thaystg , @vargaz I'm very sorry folk, my bad, you're all added because I changed the base for this PR. The participation is not needed. |
Thanks @stephentoub for your feedback. From my point of view,
Yes, with one extra instruction: and operator. I fixed that with carefully selected value of
Overhead for all existing methods remain the same except for With proposed changes, we can use happens before relationship to analyze concurrency situations. I think this approach is less risky than leaving everything as-is. We have three sources of state transitions: scheduled cancellation, manual cancellation (via
Additionally, this schema naturally resolves the undefined behavior caused by concurrent calls of |
The original intent here was to use the timer's lock as the coordination mechanism. That's still valid; the only thing that was missed was that the timer might explicitly throw if it's been closed. We're about to ship the release, and the proposed lock-free and non-trivial changes to synchronization are inherently risky, as is the potential perf impact to IsCancellationRequested I mentioned. We don't have time to fully vet such changes and any downstream impact. It's a lot less risky to simply add the catch block to deal with the one missed aspect initially. Even when we're not constrained by time, I'd prefer to follow-through with the original design and continue using the existing lock for the coordination and just avoiding the throw rather than adding any expense at all to IsCancellationRequested. |
All expenses related to However, I'm agree with you about the timeline, so I see the following consensus here:
What do you think? |
You keep trying to label it that way 😄, but I disagree with the categorization. TimerQueueTimer is signaling a particular state used for coordination with an exception rather than a return value, and the bug fundamentally is that the call site isn't properly handling that signal. The true fix for the bug is thus adding that catch block. Separately I dislike that this is the mechanism by which the internal TimerQueueTimer.Change is signaling that condition, and we can subsequently look at addressing that such that the state is optionally signaled via return value rather than exception, and both this catch block and the other in the file could then be changed to listen for that new signal (which this call site already is, and the other one doesn't actually need to pay attention to). But, yes, please either change this PR to be that fix or open a new one for it. Either way is fine, though I still don't think what's in this PR is the proper long-term change, as it's adding complexity for little reason. It's also a breaking change: note that with how it's in this PR now, exceptions thrown from cancellation callbacks may be thrown out of TryReset. |
Good catch. I'll change this PR with a quick fix (let's use this term instead of dirty hack 😄 ) using |
src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1335487553 |
PR contains a fix for #60182
Also, the fix resolves potential concurrency between
Cancel
andTryReset
as follows:Cancel
happens beforeTryReset
then CTS will be canceled normallyTryReset
happens beforeCancel
thenCancel
will have no effect