-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RedisDistributedLock Releases Lock Prematurely Before ParallelMethod Completes #190
Comments
@BroderQi I don’t understand what your repro is trying to show. You have just one lock call wrapping the parallel execution. What happens? What do you expect to happen? also, notice that you are calling TryAcquire but you are not checking the handle for null. See examples in the docs for how to use tryacquire, or just use acquire. |
The ParallelMethod successfully acquired the lock and started execution, but the LeaseMonitor released the lock before waiting for the completion of the ParallelMethod. Because at that if statement, the OnHandleLost method was successfully executed. I expect the lock to be released only after the completion of the ParallelMethod. |
Does the issue only repro with Parallel? What if you just have a very long sleep instead with no parallel? If it’s parallel-specific, you could be hitting thread starvation where the renewal process can’t get in and do its work. also, does increasing the duration in the options affect it? I would assume that setting a duration longer than you need to hold it would fix the issue. |
Thank you for your response. The above-mentioned issue only arises in multi-threaded scenarios, so I will increase the TTL time. Additionally, can we adjust the priority of renewal tasks or employ other scheduling strategies to ensure the stability of the renewal thread? |
Does increasing the TTL actually fix it for you? I just want to confirm before we assume renewal scheduling is the issue. Currently there are 2 configurable parameters: base ttl (expiry) and renewal cadence (extension cadence). Right now we default to 30s and 10s. In your scenario, I’m curious what timings you observe for when the lease is observed to be lost. If the problem is that extension comes too late, I’d expect this to happen some time after the 30s mark. As for what we can do to fix this, if the problem is truly thread starvation I’m not sure there’s anything that can fix it 100%, since ultimately we need to get in there and run some tasks to do the renewal (multiple tasks since this is all async). I’m open to ideas though! However we could probably make it much less likely by shifting the default ratio so that extension is smaller relative to expiry. Users can of course set the TTL arbitrarily high; the only downside of doing this is that if the process holding the lock crashes the high TTL is how long you have to wait before someone else can acquire. |
Imho this could be solved with a documentation note on DistributedLock.Redis that if you use a large amounts of tasks via eg. via Parallel.For / Parallel.ForEach that there is a risk of losing a renewal due to thread starvation. The note should also mention that, if you want to use DistributedLock.Redis together with Parallel.x you should either set ParallelOptions.MaxDegreeOfParallelism or create a new TaskScheduler for ParallelOptions.TaskScheduler so the parallel invocation has it's own thread pool. |
@hoerup I think the guidance is a bit broader: if you are going to exhaust the thread pool then any lease-based locks need a higher TTL to avoid spurious expiry since the renewal mechanism relies on being able to run background tasks. Alternatively as you say you can move your workload off the default scheduler to prevent this. |
The text was updated successfully, but these errors were encountered: