Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore duplicate client requests #379

Merged
merged 4 commits into from
Apr 16, 2024
Merged

Conversation

sebastianburckhardt
Copy link
Member

During stress testing we observed to our surprise that event hubs sometimes duplicates events internally. Specifically, it is possible for an event to be successfully sent by a partition client only once, but then this event appears twice in the delivery sequence to the partition, with two different sequence numbers. Specifically we saw the same event being delivered at positions 2115 and 2121. The events in between (positions 2116 to 2120) corresponded to events that were sent from a different source.

Thankfully the practical impact of this discovery is relatively minor since we are already deduplicating almost all events. The only events not being duplicated are client requests that are sent from a client to a partition.

Netherite was designed assuming this would not happen, so we detected this because an assertion violation triggered.
In this PR we modify the code so that if a duplicate is detected, we ignore it instead of creating an assertion violation error.

Note that this is not a reliable client request deduplication, because if the duplicate arrives much later than the original, the duplicate will be processed a second time. However I think this is an o.k. approximation since any clients that are sensitive to duplication problems should already use idempotent versions of the API calls (such as including deduplication status when creating an orchestration) which would not be affected by internal event duplication.

@sebastianburckhardt sebastianburckhardt added the bug Something isn't working label Apr 12, 2024
@sebastianburckhardt sebastianburckhardt added this to the 1.5.0 milestone Apr 12, 2024
Copy link
Member

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

src/DurableTask.Netherite/PartitionState/PrefetchState.cs Outdated Show resolved Hide resolved
Comment on lines 86 to 89
else
{
return; // this is a duplicate. Ignore it.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: given that this is rare - should we log it? Might be helpful to know if EH is acting up and duplicating events a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants