Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in OutboxState #250

Merged
merged 2 commits into from
Apr 20, 2023
Merged

Conversation

sebastianburckhardt
Copy link
Member

Fixes a problem discovered during testing.

While iterating over the collection of messages to send, even if this is just a single message, the send can actually complete, and the send confirmation can also complete, at which point the element may be removed from the collection, which causes a conflict with the initial loop iteration as we see here:

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at DurableTask.Netherite.OutboxState.Send(Batch batch) in D:\git\durabletask-netherite\src\DurableTask.Netherite\PartitionState\OutboxState.cs:line 112
   at DurableTask.Netherite.DurabilityListeners.ConfirmDurable(Event evt) in D:\git\durabletask-netherite\src\DurableTask.Netherite\Util\DurabilityListeners.cs:line 79
   at DurableTask.Netherite.Faster.LogWorker.Process(IList`1 batch) in D:\git\durabletask-netherite\src\DurableTask.Netherite\StorageLayer\Faster\LogWorker.cs:line 168

This PR fixes the problem by copying the collection into an array before iterating over it.

Copy link
Member

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and questions

Comment on lines -102 to -103
var outMessages = batch.OutgoingMessages.Count < 2 ? batch.OutgoingMessages : batch.OutgoingMessages.ToList();// prevent concurrent mod
batch.TotalAcksExpected = batch.OutgoingResponses.Count + outMessages.Count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what was this concurrent mod about? Not sure how this count-check is related to concurrent modifications

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was based on the assumption that concurrent modification is only a problem if the list is larger than one element. That assumption was wrong. So we now always make a copy of the list.

src/DurableTask.Netherite/PartitionState/OutboxState.cs Outdated Show resolved Hide resolved
src/DurableTask.Netherite/PartitionState/OutboxState.cs Outdated Show resolved Hide resolved
@sebastianburckhardt
Copy link
Member Author

Thanks for the review. I have committed your suggestions.

@sebastianburckhardt sebastianburckhardt merged commit 67255a9 into dev Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants