Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Feed Processor: Fixes LeaseLostException on Notifications API for Renewer #4276

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

ealsur
Copy link
Member

@ealsur ealsur commented Jan 24, 2024

#3401 fixed all the scenarios where the LeaseLostException is generated based on a logical condition (not network request) to contain a CosmosException that is reportable to the Notifications API.

The PartitionSupervisor was modified in that same PR to only report the inner CosmosException to the Notifications API.

However, there was a missing case:

The PartitionController starts a PartitionSupervisor once acquiring a lease, which spans a PartitionProcessor and Renewer.

The Renewer can hit a LeaseLostException if the PartitionProcessor is not processing (idle partition) but being renewed, in that case, the LeaseLostException goes up, stops the PartitionSupervisor as expected, and gets bubbled up to the PartitionController, which releases the lease.

The problem was that the PartitionController calls the Notification APIs to report Errors without identifying that this can only be done for LeaseLostExceptions when they have a linked inner exception (CosmosException). This handling was previously added to other flows (lease acquiring, lease release) but not on the renewal.

Symptoms

This was reported in Azure Functions, where the Functions Extension has a switch statement to log CosmosExceptions:

https://github.com/Azure/azure-webjobs-sdk-extensions/blob/cf0fc8022b230aa1e19325638881a56210474189/src/WebJobs.Extensions.CosmosDB/Trigger/CosmosDBTriggerHealthMonitor.cs#L23-L37

But it is receiving the OuterType as LeaseLostException:

image

Verification

With this fix, the emitted Notifications/Logs, have the proper Severity Level (Information) and the proper Type (CosmosException):

image

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks

@ealsur ealsur added the auto-merge Enables automation to merge PRs label Jan 25, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot merged commit 5ab5a7e into master Jan 25, 2024
20 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the users/ealsur/leaselostrenew branch January 25, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants