-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TSAN s2n_shutdown failures #4055
Conversation
utils/s2n_atomic.h
Outdated
* rely on setting / clearing a small value generally being atomic in practice. | ||
*/ | ||
S2N_RESULT s2n_atomic_init(); | ||
void s2n_atomic_set(s2n_atomic *var); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, store
implies that you can specify a value, instead of it always storing true.
And how about "test" instead of "check"? That'd be inline with the builtin __atomic_test_and_set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our offline conversation, I replaced set/clear with load. But I've reverted that-- I think set/clear is safer. See the KeyUpdate bug I use as an example in the description: because the reader blindly sets conn->key_update_pending
to key_update_request
, it could be setting key_update_pending back to 0. But only the writer is allowed to set key_update_pending back to 0, or else we lose key updates. Developers need to be very clear on whether they're setting or clearing the flag.
set/clear/test doesn't match atomic operations, but it does match our bitflag operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 👍
TSAN isn't going to catch errors unless we have threading tests right? Would we actually be able to catch the key_update issue since our thread tests don't send enough data to do a keyupdate? |
I have a KeyUpdate thread test written and blocked on this PR. That will trigger TSAN. That's also why this PR doesn't actually fix the KeyUpdate problems yet ;) |
Resolved issues:
Related to #4026
Description of changes:
s2n-tls has always used "informal" atomic variables, since C99 technically has no concept of concurrency. However, if we want to use TSAN to test for concurrency issues without worrying about filtering out false positives, we need to formalize our atomic operations.
Even if we stick to the C99 standard, GCC provides builtins that implement atomic operations. For small variables like
int
in sane environments, those operations are non-locking. That's why s2n-tls's "informal" atomics work in the first place ;)I added an "s2n_atomic" type to s2n-tls that uses the GCC builtins if supported and enabled. This type has a couple benefits:
Call-outs:
For now, I'm erring on the side of caution and only using the new behavior for TSAN testing. We could probably enable atomics outside of testing if available, but I'm not sure how we'd handle odd environments where the atomic operations on s2n_atomic require locking. Right now, if you enable atomic, s2n_init will fail if locking is required.
Testing:
ThreadSanitizer passes for all tests now: https://us-west-2.console.aws.amazon.com/codesuite/codebuild/024603541914/projects/s2nGeneralBatch/batch/s2nGeneralBatch%3A0ecc09d3-bd8d-4fe7-9380-cba0b9e823ca?region=us-west-2 Previously it failed for s2n_examples_test due to the shutdown issues.
I will add a test for KeyUpdates in another PR, once I've officially enabled the ThreadSanitizer CI job.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.