Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in CurrentValueRelay #3447

Merged
merged 9 commits into from
Oct 22, 2024

Conversation

kabiroberai
Copy link
Contributor

As it's currently written, CurrentValueRelay seems to have several race conditions. These are infrequent enough to make them rather elusive in typical usage, but we believe that these are the root cause behind 0.X% of Ramp iOS sessions crashing.

This PR includes a test case that fails on main, as well as a fix that solves it, but is slow. It's probably worth you folks writing a fix yourself (without needing the lock to be recursive) to resolve the issue, but I hope that the test case helps. FWIW you should also be able to trip up TSAN when running the failing test case pre-fix; it catches a number of the ways in which the race can be triggered.

@iampatbrown
Copy link
Contributor

Hey @kabiroberai :) awesome work! Thanks for fixing this.
Your changes look good and worked well on my end. I think they can be merged as is.
One question related to locking access to the upstream's current value, do you have an example where this is required or possibly a failing test? I'm not saying it's not needed, I'm just trying to better understand for my own curiosity 😅
Also, is the _downstream property mainly so we can lock access to it? It could be worth locking above the guard statements to avoid locking/unlocking twice.

@kabiroberai
Copy link
Contributor Author

appreciate the vote of confidence @iampatbrown!

re locking upstream.currentValue, the reason behind this is that, as far as I can tell, it's otherwise possible for a Subscription.request(_:) call to read the value in parallel with a call to CurrentValueRelay.send which can be writing to it at the same time. This breaks the invariant that there can only be one writer xor multiple readers of shared mutable state.

As for the _downstream property, indeed the indirection is mainly a convenience. Moving the lock() invocations up is a good call, I'll give that a shot.

Copy link
Member

@stephencelis stephencelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stephencelis stephencelis merged commit 5614943 into pointfreeco:main Oct 22, 2024
13 checks passed
jpsim added a commit to jpsim/swift-composable-architecture that referenced this pull request Nov 29, 2024
We're still seeing crashes due to what look like data races in
`DemandBuffer`. Here, I'm making fixes based on what appears to have
helped in
pointfreeco#3447:

In `buffer(value:)`:

* Check if the demand is unlimited and, if so, unlock before calling
  `subscriber.receive(value)`.
* If the demand is not unlimited, append the value to the buffer and
  call `flush()` after unlocking.

In `flush(adding:)`:

* Process values in a loop, acquiring and releasing the lock around
  shared state access.
* Unlock before calling `subscriber.receive(value)` to avoid holding
  the lock during subscriber calls.
* After sending a value, lock again to update `demandState.requested`
  with any additional demand returned by the subscriber.
* Ensure that `subscriber.receive(completion:)` is called outside the
  lock.

Also add a new `DemandBufferTests` test file. I added some tests that
I was hoping would fail without the above changes, but they all pass
both before and after, even with ThreadSanitizer enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants