Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DistributedLock.Azure - Locks not being released as of 7/1/2024 #226

Open
wnbittle opened this issue Aug 6, 2024 · 9 comments
Open

DistributedLock.Azure - Locks not being released as of 7/1/2024 #226

wnbittle opened this issue Aug 6, 2024 · 9 comments

Comments

@wnbittle
Copy link

wnbittle commented Aug 6, 2024

We've been using this library for a few years now with success. We recently ran into an issue with it on all of our function apps around 7/1/2024 timeframe.

The errors we're getting are:

The soft deleted blob was not found.
RequestId:b3710d9f-801e-002a-0f49-e7e276000000
Time:2024-08-05T15:06:33.1727180Z
Status: 404 (The soft deleted blob was not found.)
ErrorCode: SoftDeletedBlobNotFound
 
Headers:
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: b3710d9f-801e-002a-0f49-e7e276000000
x-ms-client-request-id: 583cc707-3348-4d8c-9002-c7e2250824c2
x-ms-version: 2020-02-10
x-ms-error-code: SoftDeletedBlobNotFound
Date: Mon, 05 Aug 2024 15:06:33 GMT
Content-Length: 228
Content-Type: application/xml

The behavior we're seeing (we created a new storage container to test) is as follows:

  1. The function receives a message from a source system
  2. The function obtains a lock on that message's unique id inside a using statement
  3. The function successfully performs the critical section work
  4. The using never exits
  5. The next request for that same message id waits for the allotted time, but fails to acquire the lock

When we look at the storage account the file is gone (so it successfully cleaned up). Here's the code we're using (sorry for the screenshot).

image

The crazy thing is that I can't find any information on the error above (SoftDeletedBlobNotFound) either.

What am I missing?

@madelson
Copy link
Owner

madelson commented Aug 6, 2024

When you say the using never exits, do you mean that DisposeAsync() is being called but it never returns?

Is this something you can reproduce reliably? What is the full repro?

did you try playing around with the lock options? Just curious if changing any of them makes a difference.

@wnbittle
Copy link
Author

wnbittle commented Aug 6, 2024

The repo is private so I can't share much more than what I have.

Yes, we can reproduce reliably. For example, here's a test we ran:

  • We left the code as-is (last deploy was in Jan)
  • We created a new container in the same storage account and configured the function app to use that new container. This restarted the function app as well
  • We triggered the integration again
  • It processed through all the logic and reached the last logger statement before hanging, never hitting the finally (we waited 30 mins at least while we were discussing other possible causes)

By other locking do you mean the other providers? We have not. This is code that's in maintenance mode for at least 6 months, so we're just trying to restore normal operation.

Are there any options in the library (for blob) to specify how long to retain the lock before giving up and releasing it anyway?

@wnbittle
Copy link
Author

wnbittle commented Aug 6, 2024

Update:

I was able replicate the issue in a isolated environment: New Storage Account w/ the code below running locally. Here's what I discovered:

  • Running with a new storage account with defaults, everything appears to work properly.
  • Running with a new storage account and disabling the following, reproduces this problem every time.

image

The Payload class:

namespace WB.Test.Locking
{
    public class Payload
    {
        public string Id { get; set; }
        public string Name { get; set; }

        public string SubId { get; set; }
    }
}

The Azure Function:

using Azure.Storage.Blobs;
using Medallion.Threading.Azure;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.Extensions.Logging;
using System.ComponentModel;

namespace WB.Test.Locking
{
    public class Function1
    {
        private readonly ILogger<Function1> logger;

        public Function1(ILogger<Function1> logger)
        {
            this.logger = logger;
        }

        [Function("Function1")]
        public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequest req, [Microsoft.Azure.Functions.Worker.Http.FromBody] Payload payload)
        {
            logger.LogInformation("C# HTTP trigger function processed a request.");

            //var container = new BlobContainerClient("...", "locking-test");
            var container = new BlobContainerClient("...", "locking-test");

            await ExecuteWithDistributedLock(payload.Id, container, async () => {
                await DoWork(payload, container);
            });
            
            logger.LogInformation($"Lock for {payload.Id} released");

            return new OkObjectResult("Welcome to Azure Functions!");
        }

        private async Task DoWork(Payload payload, BlobContainerClient container)
        {
            // wait 10 seconds
            logger.LogInformation($"Waiting 10 seconds...");
            await Task.Delay(10000);
            logger.LogInformation($"Wait complete...");

            try
            {
                await ExecuteWithDistributedLock(payload.SubId, container, async () =>
                {
                    logger.LogInformation($"Waiting 10 seconds...");
                    await Task.Delay(10000);
                    logger.LogInformation($"Wait complete...");
                });
            }
            catch (Exception ex)
            {
                logger.LogError("sub lock failed");
            }
        }

        private async Task ExecuteWithDistributedLock(string id, BlobContainerClient container, Func<Task> function)
        {
            var @lock = new AzureBlobLeaseDistributedLock(container, id);

            logger.LogInformation($"Trying to acquire lock for {id}...");
            await using (var handle = await @lock.TryAcquireAsync(new TimeSpan(0, 0, 11)))
            {
                if (handle != null)
                {
                    logger.LogInformation($"Lock acquired for {id}...");

                    await function.Invoke();

                    logger.LogInformation($"Wait complete for {id}...");
                }
                else
                {
                    throw new Exception($"Failed to get lock for {id}");
                }
            }
        }
    }
}

Here's the function output:
image

The testing process and behavior:

  1. I execute the first request from Postman
  2. It goes through the logging above and completes with a 200
  3. I execute the second request from Postman
  4. I get the error above when it attempts to get the lock
  5. EVERY request with the same lock id now fails FOREVER
  6. No blobs are visible in the blob container

Output in text form:

[2024-08-06T15:20:24.589Z] Host lock lease acquired by instance ID '000000000000000000000000DB8C9F86'.
[2024-08-06T15:20:36.131Z] Executing 'Functions.Function1' (Reason='This function was programmatically called via the host APIs.', Id=4d1112ce-b6ce-42b1-a7ca-c603f7f69af6)
[2024-08-06T15:20:36.336Z] C# HTTP trigger function processed a request.
[2024-08-06T15:20:36.381Z] Trying to acquire lock for 2...
[2024-08-06T15:20:36.793Z] Lock acquired for 2...
[2024-08-06T15:20:36.797Z] Waiting 10 seconds...
[2024-08-06T15:20:46.797Z] Wait complete...
[2024-08-06T15:20:46.798Z] Trying to acquire lock for 2.1...
[2024-08-06T15:20:46.962Z] Lock acquired for 2.1...
[2024-08-06T15:20:46.963Z] Waiting 10 seconds...
[2024-08-06T15:20:56.963Z] Wait complete...
[2024-08-06T15:20:56.964Z] Wait complete for 2.1...
[2024-08-06T15:20:57.036Z] Wait complete for 2...
[2024-08-06T15:20:57.076Z] Lock for 2 released
[2024-08-06T15:20:57.082Z] Executing OkObjectResult, writing value of type 'System.String'.
[2024-08-06T15:20:57.136Z] Executed 'Functions.Function1' (Succeeded, Id=4d1112ce-b6ce-42b1-a7ca-c603f7f69af6, Duration=21026ms)
[2024-08-06T15:21:01.453Z] Executing 'Functions.Function1' (Reason='This function was programmatically called via the host APIs.', Id=7a1904d9-73eb-40a2-afef-df3935313037)
[2024-08-06T15:21:01.463Z] C# HTTP trigger function processed a request.
[2024-08-06T15:21:01.464Z] Trying to acquire lock for 2...
[2024-08-06T15:21:01.612Z] Function 'Function1', Invocation id '7a1904d9-73eb-40a2-afef-df3935313037': An exception was thrown by the invocation.
[2024-08-06T15:21:01.613Z] Result: Function 'Function1', Invocation id '7a1904d9-73eb-40a2-afef-df3935313037': An exception was thrown by the invocation.
Exception: Azure.RequestFailedException: The soft deleted blob was not found.
RequestId:a8f767bc-801e-0033-6414-e8cf50000000
Time:2024-08-06T15:21:01.6095988Z
[2024-08-06T15:21:01.614Z] Status: 404 (The soft deleted blob was not found.)
[2024-08-06T15:21:01.614Z] ErrorCode: SoftDeletedBlobNotFound
[2024-08-06T15:21:01.615Z]
[2024-08-06T15:21:01.615Z] Content:
[2024-08-06T15:21:01.616Z] <?xml version="1.0" encoding="utf-8"?><Error><Code>SoftDeletedBlobNotFound</Code><Message>The soft deleted blob was not found.
RequestId:a8f767bc-801e-0033-6414-e8cf50000000
Time:2024-08-06T15:21:01.6095988Z</Message></Error>
[2024-08-06T15:21:01.616Z]
[2024-08-06T15:21:01.616Z] Headers:
[2024-08-06T15:21:01.617Z] Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
[2024-08-06T15:21:01.617Z] x-ms-request-id: a8f767bc-801e-0033-6414-e8cf50000000
[2024-08-06T15:21:01.618Z] x-ms-client-request-id: 3e50a73d-953a-46e8-bab9-b60ff750cb7d
[2024-08-06T15:21:01.618Z] x-ms-version: 2023-11-03
[2024-08-06T15:21:01.619Z] x-ms-error-code: SoftDeletedBlobNotFound
[2024-08-06T15:21:01.620Z] Date: Tue, 06 Aug 2024 15:21:01 GMT
[2024-08-06T15:21:01.620Z] Content-Length: 228
[2024-08-06T15:21:01.621Z] Content-Type: application/xml
[2024-08-06T15:21:01.621Z]
[2024-08-06T15:21:01.625Z]    at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
[2024-08-06T15:21:01.626Z]    at Azure.Storage.Blobs.BlobRestClient.AcquireLeaseAsync(Nullable`1 timeout, Nullable`1 duration, String proposedLeaseId, String ifTags, RequestConditions requestConditions, RequestContext context)
[2024-08-06T15:21:01.627Z]    at Azure.Storage.Blobs.Specialized.BlobLeaseClient.AcquireInternal(TimeSpan duration, RequestConditions conditions, Boolean async, RequestContext context)
[2024-08-06T15:21:01.628Z]    at Azure.Storage.Blobs.Specialized.BlobLeaseClient.AcquireAsync(TimeSpan duration, RequestConditions conditions, CancellationToken cancellationToken)
[2024-08-06T15:21:01.628Z]    at Medallion.Threading.Azure.AzureBlobLeaseDistributedLock.TryAcquireAsync(BlobLeaseClientWrapper leaseClient, CancellationToken cancellationToken, Boolean isRetryAfterCreate) in C:\Users\mikea\Documents\Interests\CS\DistributedLock\src\DistributedLock.Azure\AzureBlobLeaseDistributedLock.cs:line 112
[2024-08-06T15:21:01.629Z]    at Medallion.Threading.Azure.AzureBlobLeaseDistributedLock.TryAcquireAsync(BlobLeaseClientWrapper leaseClient, CancellationToken cancellationToken, Boolean isRetryAfterCreate) in C:\Users\mikea\Documents\Interests\CS\DistributedLock\src\DistributedLock.Azure\AzureBlobLeaseDistributedLock.cs:line 147
[2024-08-06T15:21:01.631Z]    at Medallion.Threading.Internal.BusyWaitHelper.WaitAsync[TState,TResult](TState state, Func`3 tryGetValue, TimeoutValue timeout, TimeoutValue minSleepTime, TimeoutValue maxSleepTime, CancellationToken cancellationToken) in /_/src/DistributedLock.Core/Internal/BusyWaitHelper.cs:line 22
[2024-08-06T15:21:01.632Z]    at WB.Test.Locking.Function1.ExecuteWithDistributedLock(String id, BlobContainerClient container, Func`1 function) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\Function1.cs:line 65
[2024-08-06T15:21:01.632Z]    at WB.Test.Locking.Function1.Run(HttpRequest req, Payload payload) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\Function1.cs:line 29
[2024-08-06T15:21:01.637Z]    at WB.Test.Locking.DirectFunctionExecutor.ExecuteAsync(FunctionContext context) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\obj\Debug\net8.0\Microsoft.Azure.Functions.Worker.Sdk.Generators\Microsoft.Azure.Functions.Worker.Sdk.Generators.FunctionExecutorGenerator\GeneratedFunctionExecutor.g.cs:line 38
[2024-08-06T15:21:01.638Z]    at Microsoft.Azure.Functions.Worker.OutputBindings.OutputBindingsMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\src\DotNetWorker.Core\OutputBindings\OutputBindingsMiddleware.cs:line 13
[2024-08-06T15:21:01.639Z]    at Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore.FunctionsHttpProxyingMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\extensions\Worker.Extensions.Http.AspNetCore\src\FunctionsMiddleware\FunctionsHttpProxyingMiddleware.cs:line 48
[2024-08-06T15:21:01.640Z]    at Microsoft.Azure.Functions.Worker.FunctionsApplication.InvokeFunctionAsync(FunctionContext context) in D:\a\_work\1\s\src\DotNetWorker.Core\FunctionsApplication.cs:line 77
Stack:    at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
[2024-08-06T15:21:01.642Z]    at Azure.Storage.Blobs.BlobRestClient.AcquireLeaseAsync(Nullable`1 timeout, Nullable`1 duration, String proposedLeaseId, String ifTags, RequestConditions requestConditions, RequestContext context)
[2024-08-06T15:21:01.645Z]    at Azure.Storage.Blobs.Specialized.BlobLeaseClient.AcquireInternal(TimeSpan duration, RequestConditions conditions, Boolean async, RequestContext context)
[2024-08-06T15:21:01.646Z]    at Azure.Storage.Blobs.Specialized.BlobLeaseClient.AcquireAsync(TimeSpan duration, RequestConditions conditions, CancellationToken cancellationToken)
[2024-08-06T15:21:01.647Z]    at Medallion.Threading.Azure.AzureBlobLeaseDistributedLock.TryAcquireAsync(BlobLeaseClientWrapper leaseClient, CancellationToken cancellationToken, Boolean isRetryAfterCreate) in C:\Users\mikea\Documents\Interests\CS\DistributedLock\src\DistributedLock.Azure\AzureBlobLeaseDistributedLock.cs:line 112
[2024-08-06T15:21:01.648Z]    at Medallion.Threading.Azure.AzureBlobLeaseDistributedLock.TryAcquireAsync(BlobLeaseClientWrapper leaseClient, CancellationToken cancellationToken, Boolean isRetryAfterCreate) in C:\Users\mikea\Documents\Interests\CS\DistributedLock\src\DistributedLock.Azure\AzureBlobLeaseDistributedLock.cs:line 147
[2024-08-06T15:21:01.649Z]    at Medallion.Threading.Internal.BusyWaitHelper.WaitAsync[TState,TResult](TState state, Func`3 tryGetValue, TimeoutValue timeout, TimeoutValue minSleepTime, TimeoutValue maxSleepTime, CancellationToken cancellationToken) in /_/src/DistributedLock.Core/Internal/BusyWaitHelper.cs:line 22
[2024-08-06T15:21:01.650Z]    at WB.Test.Locking.Function1.ExecuteWithDistributedLock(String id, BlobContainerClient container, Func`1 function) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\Function1.cs:line 65
[2024-08-06T15:21:01.651Z]    at WB.Test.Locking.Function1.Run(HttpRequest req, Payload payload) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\Function1.cs:line 29
[2024-08-06T15:21:01.653Z]    at WB.Test.Locking.DirectFunctionExecutor.ExecuteAsync(FunctionContext context) in C:\Projects\Unversioned\WB.Test.Locking\WB.Test.Locking\obj\Debug\net8.0\Microsoft.Azure.Functions.Worker.Sdk.Generators\Microsoft.Azure.Functions.Worker.Sdk.Generators.FunctionExecutorGenerator\GeneratedFunctionExecutor.g.cs:line 38
[2024-08-06T15:21:01.656Z]    at Microsoft.Azure.Functions.Worker.OutputBindings.OutputBindingsMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\src\DotNetWorker.Core\OutputBindings\OutputBindingsMiddleware.cs:line 13
[2024-08-06T15:21:01.657Z]    at Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore.FunctionsHttpProxyingMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\extensions\Worker.Extensions.Http.AspNetCore\src\FunctionsMiddleware\FunctionsHttpProxyingMiddleware.cs:line 48
[2024-08-06T15:21:01.658Z]    at Microsoft.Azure.Functions.Worker.FunctionsApplication.InvokeFunctionAsync(FunctionContext context) in D:\a\_work\1\s\src\DotNetWorker.Core\FunctionsApplication.cs:line 77.
[2024-08-06T15:25:01.483Z] Executed 'Functions.Function1' (Failed, Id=7a1904d9-73eb-40a2-afef-df3935313037, Duration=240029ms)
[2024-08-06T15:25:01.487Z] System.Private.CoreLib: Exception while executing function: Functions.Function1. Microsoft.Azure.WebJobs.Script.Grpc: Failed to proxy request with ForwarderError: RequestTimedOut. System.Net.Http: The operation was canceled. System.Net.Sockets: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request.. The I/O operation has been aborted because of either a thread exit or an application request.

@algreat2
Copy link

This issue is also reproduced for me. Changing "Soft Delete Options" didn't help. But I did another workaround: I manually created empty file with lock resource name. Then this file is not deleted after lease release as it was before. Now the file is permanent and only lease status is changing.

@wnbittle
Copy link
Author

Oh, that's a nice work around! Thanks for sharing!

@madelson
Copy link
Owner

@wnbittle based on your findings do you have a theory of why the library is hanging? Are we making some assumption that’s invalid u see certain configuration scenarios?

@algreat2
Copy link

@wnbittle based on your findings do you have a theory of why the library is hanging? Are we making some assumption that’s invalid u see certain configuration scenarios?

I tried to deep dive but stack trace doesn't reveal the steps. As this error is happening when trying to delete blob it is better to check all such places. We also know that blob which existed before doesn't cause any problems. Probably ownsBlob is calculated wrongly or it's trying to delete the blob before it is created.

@algreat2
Copy link

My suggestion was wrong. I was debugging and saw that it was the Microsoft client which raised this exception when trying to acquire a lease.

image

...

image

@madelson
Copy link
Owner

@algreat2 interesting. What I don’t understand is why it hangs without releasing the lock. The code you’re showing looks like an Acquire failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants