-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Module Fork Dead Lock #766
Comments
I managed to trigger another dead lock, this time with logging and stack trace:
It looks like it is coming from this line when the fork happens:
Not sure how to workaround this, if anyone has any input, that'll be much appreciated. |
OK, I finally tracked it down. My test module uses the RedisModule_CreateTimer() for periodic callbacks, and when the main thread in KeyDB calls the callback, the module uses another thread to call RedisModule_Fork(). To ensure there is no dead lock in KeyDB, I had had to do 2 things:
I believe some of the standard modules in Redis use the same pattern as the test module, so it can be a gotcha for anyone using Redis modules within KeyDB. If you experience a complete freeze on KeyDB, then likely you have struck this bug. It took over 35,000 calls to aeAcquireForkLock (10-12 days) to trigger the dead lock for me, so not something that is triggered quickly. Will keep testing and post any updates here. |
Although the above fixes the deadlock for the scenario above, there is another case where the same deadlock happens: Note that the proposed fix there might fix this issue. |
I had more time to review the code recently, and I have finally tracked this down (with the help of strace and gdb). The freeze is due to a deadlock condition between the server thread and module thread:
I have a workaround on my local:
Hope this helps anyone using modules and having keydb occasionally locking up. |
I wasn't too happy with the try lock above for g_forkLock, since this can cause the module to miss out fork events. I am now experimenting with using aeAcquireLock() instead of aeAcquireForkLock(). So far, no deadlocks. Looking at the history of fork lock, it was introduced in 6.3.0 to address rdb background saving issues: That is some significant amount of code just to do a fork! Not sure why the global lock was not used since fork() is a cheap and fast operation. So for now, just going to use aeAcquireLock() for my module and for other forks, the existing code is used. |
Working on v6.3.4, I have a test module which calls RedisModule_Fork() periodically. I encountered a condition where KeyDB completely freezes, and the logs have stopped logged. Adding some logging, I tracked it down to redisFork() in server.cpp:
when it freezes, "aeAcquireForkLock start" is printed but not the corresponding finish. Looking into this method, I can see a comment in the code:
The comment says it should release the internal lock and try again, but it looks like it is only trying again without any release?
I am still tracking this down, but if anyone has experienced the same issue, or has any advice on this, happy to take any suggestions to try out.
Note: I have Save "" in the config, so there is no background saving happening. From the logs, I cannot see other places that is trying to call the fork, so this deadlock scenario is a bit strange.
The text was updated successfully, but these errors were encountered: