Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TSAN s2n_shutdown failures #4055

Merged
merged 7 commits into from
Jun 15, 2023
Merged

Fix TSAN s2n_shutdown failures #4055

merged 7 commits into from
Jun 15, 2023

Conversation

lrstewart
Copy link
Contributor

@lrstewart lrstewart commented Jun 14, 2023

Resolved issues:

Related to #4026

Description of changes:

s2n-tls has always used "informal" atomic variables, since C99 technically has no concept of concurrency. However, if we want to use TSAN to test for concurrency issues without worrying about filtering out false positives, we need to formalize our atomic operations.

Even if we stick to the C99 standard, GCC provides builtins that implement atomic operations. For small variables like int in sane environments, those operations are non-locking. That's why s2n-tls's "informal" atomics work in the first place ;)

I added an "s2n_atomic" type to s2n-tls that uses the GCC builtins if supported and enabled. This type has a couple benefits:

  • Explicitly marks in our code which variables are touched by both the reader and the writer.
  • Provides a limited set of operations that should keep even our "informal" atomics safe. For example, using s2n_atomic_set would force us to fix this thread-safety bug.
  • Allows us to use TSAN to detect problems.

Call-outs:

For now, I'm erring on the side of caution and only using the new behavior for TSAN testing. We could probably enable atomics outside of testing if available, but I'm not sure how we'd handle odd environments where the atomic operations on s2n_atomic require locking. Right now, if you enable atomic, s2n_init will fail if locking is required.

Testing:

ThreadSanitizer passes for all tests now: https://us-west-2.console.aws.amazon.com/codesuite/codebuild/024603541914/projects/s2nGeneralBatch/batch/s2nGeneralBatch%3A0ecc09d3-bd8d-4fe7-9380-cba0b9e823ca?region=us-west-2 Previously it failed for s2n_examples_test due to the shutdown issues.
I will add a test for KeyUpdates in another PR, once I've officially enabled the ThreadSanitizer CI job.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Jun 14, 2023
@lrstewart lrstewart requested a review from camshaft June 14, 2023 22:00
@lrstewart lrstewart marked this pull request as ready for review June 14, 2023 22:00
@lrstewart lrstewart requested a review from maddeleine June 14, 2023 22:00
utils/s2n_atomic.h Outdated Show resolved Hide resolved
* rely on setting / clearing a small value generally being atomic in practice.
*/
S2N_RESULT s2n_atomic_init();
void s2n_atomic_set(s2n_atomic *var);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth copying the naming conventions from rust or cpp. That would be store instead of set, load instead of check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, store implies that you can specify a value, instead of it always storing true.
And how about "test" instead of "check"? That'd be inline with the builtin __atomic_test_and_set

Copy link
Contributor Author

@lrstewart lrstewart Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our offline conversation, I replaced set/clear with load. But I've reverted that-- I think set/clear is safer. See the KeyUpdate bug I use as an example in the description: because the reader blindly sets conn->key_update_pending to key_update_request, it could be setting key_update_pending back to 0. But only the writer is allowed to set key_update_pending back to 0, or else we lose key updates. Developers need to be very clear on whether they're setting or clearing the flag.

set/clear/test doesn't match atomic operations, but it does match our bitflag operations.

CMakeLists.txt Outdated Show resolved Hide resolved
@lrstewart lrstewart requested a review from camshaft June 15, 2023 01:08
Copy link
Contributor

@camshaft camshaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 👍

@maddeleine
Copy link
Contributor

maddeleine commented Jun 15, 2023

For now, I'm erring on the side of caution and only using the new behavior for TSAN testing.

TSAN isn't going to catch errors unless we have threading tests right? Would we actually be able to catch the key_update issue since our thread tests don't send enough data to do a keyupdate?

@lrstewart
Copy link
Contributor Author

lrstewart commented Jun 15, 2023

For now, I'm erring on the side of caution and only using the new behavior for TSAN testing.

TSAN isn't going to catch errors unless we have threading tests right? Would we actually be able to catch the key_update issue since our thread tests don't send enough data to do a keyupdate?

I have a KeyUpdate thread test written and blocked on this PR. That will trigger TSAN.

That's also why this PR doesn't actually fix the KeyUpdate problems yet ;)

@lrstewart lrstewart enabled auto-merge (squash) June 15, 2023 19:15
@lrstewart lrstewart merged commit c9dd66e into aws:main Jun 15, 2023
@lrstewart lrstewart deleted the threads_close branch June 15, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants