Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisDistributedLock Releases Lock Prematurely Before ParallelMethod Completes #190

Open
BroderQi opened this issue Mar 2, 2024 · 7 comments

Comments

@BroderQi
Copy link

BroderQi commented Mar 2, 2024

// Using an open-source DistributedLock framework, the method keeps executing while the lock is released after a lease period (before the method finishes).

public void UseLock()
{
    var redisDistributedLock = new RedisDistributedLock("myKey", ConnectionMultiplexer.Connect("xxx.xxx.xxx.xxx:6379").GetDatabase());

    using (var handle = redisDistributedLock.TryAcquire())
    {
        ParallelMethod();
    }
}

private void ParallelMethod()
{
    List<int> numbers = Enumerable.Range(0, 30001).ToList();

    int processedCount = 0;
    Parallel.ForEach(numbers, item =>
    {
        Interlocked.Increment(ref processedCount);
        Thread.Sleep(500);

        Console.WriteLine($"There are still {numbers.Count - processedCount} items pending processing");
    });
}
@madelson
Copy link
Owner

madelson commented Mar 2, 2024

@BroderQi I don’t understand what your repro is trying to show. You have just one lock call wrapping the parallel execution. What happens? What do you expect to happen?

also, notice that you are calling TryAcquire but you are not checking the handle for null. See examples in the docs for how to use tryacquire, or just use acquire.

@BroderQi
Copy link
Author

BroderQi commented Mar 4, 2024

image

The ParallelMethod successfully acquired the lock and started execution, but the LeaseMonitor released the lock before waiting for the completion of the ParallelMethod.

Because at that if statement, the OnHandleLost method was successfully executed.

I expect the lock to be released only after the completion of the ParallelMethod.

@madelson
Copy link
Owner

madelson commented Mar 5, 2024

Does the issue only repro with Parallel? What if you just have a very long sleep instead with no parallel? If it’s parallel-specific, you could be hitting thread starvation where the renewal process can’t get in and do its work.

also, does increasing the duration in the options affect it? I would assume that setting a duration longer than you need to hold it would fix the issue.

@BroderQi
Copy link
Author

BroderQi commented Mar 6, 2024

Thank you for your response. The above-mentioned issue only arises in multi-threaded scenarios, so I will increase the TTL time. Additionally, can we adjust the priority of renewal tasks or employ other scheduling strategies to ensure the stability of the renewal thread?

@madelson
Copy link
Owner

madelson commented Mar 6, 2024

Does increasing the TTL actually fix it for you? I just want to confirm before we assume renewal scheduling is the issue.

Currently there are 2 configurable parameters: base ttl (expiry) and renewal cadence (extension cadence). Right now we default to 30s and 10s.

In your scenario, I’m curious what timings you observe for when the lease is observed to be lost. If the problem is that extension comes too late, I’d expect this to happen some time after the 30s mark.

As for what we can do to fix this, if the problem is truly thread starvation I’m not sure there’s anything that can fix it 100%, since ultimately we need to get in there and run some tasks to do the renewal (multiple tasks since this is all async). I’m open to ideas though!

However we could probably make it much less likely by shifting the default ratio so that extension is smaller relative to expiry. Users can of course set the TTL arbitrarily high; the only downside of doing this is that if the process holding the lock crashes the high TTL is how long you have to wait before someone else can acquire.

@hoerup
Copy link

hoerup commented Mar 7, 2024

Imho this could be solved with a documentation note on DistributedLock.Redis that if you use a large amounts of tasks via eg. via Parallel.For / Parallel.ForEach that there is a risk of losing a renewal due to thread starvation.

The note should also mention that, if you want to use DistributedLock.Redis together with Parallel.x you should either set ParallelOptions.MaxDegreeOfParallelism or create a new TaskScheduler for ParallelOptions.TaskScheduler so the parallel invocation has it's own thread pool.

@madelson
Copy link
Owner

@hoerup I think the guidance is a bit broader: if you are going to exhaust the thread pool then any lease-based locks need a higher TTL to avoid spurious expiry since the renewal mechanism relies on being able to run background tasks. Alternatively as you say you can move your workload off the default scheduler to prevent this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants