-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reset internal state on fork to prevent deadlocks in worker threads #139
Conversation
self.create_log_stream = create_log_stream | ||
self.log_group_retention_days = log_group_retention_days | ||
self._init_state() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called with the parent's __init__
which creates its lock. Since it's ok to init the state twice I made it more obvious it's being called by calling it directly. I can add a comment on super().__init__
if that seems better (I went with this way in case the stdlib changed, but that seems unlikely).
I agree with this change in principle for reinitializing state in While reinitializing state in |
Sorry I got pulled away from this, but it looks like we're actually having issues with this on shutdown of uWSGI and my attention was reverted back to this.
That would only allow this code to support Python 3.9 and above mentioned. In other scenarios it would cause a deadlock. If there is something else you'd like to see in
I'm not sure I follow exactly the issue. Yes, I agree, there may be data loss. However after forking there will not be a thread listening to the queue and it will need to be "restarted". This code resets the internal state so that the child can restart its queue and threads. Alternatively we may be able to store a weakref or just check to see if the thread is alive and if it's not alive try to restart it. The reason I hadn't gone with that approach is it might send too much data. The only time I "really" see a scenario for data loss is if the parent forks, and then execs itself to something else. Otherwise the thread in the parent process will eventually send the messages that are expected to be sent and the child should be able to continue as expected. The only other scenario I can see messing things up is if someone else is calling the |
OK, thanks. I'll have to think more about this and what kinds of warnings to add in the docs, but on the face of it this seems correct. The safest thing to do is always to share nothing between threads/processes. That's the gist of what I was trying to get at, and what I'll try to advise in the docs. |
I'm opening this PR up to start a discussion on behavior while forking. I believe this may be related to #89 and #31. It looks like there has been recent activity on #31 (comment) and this may possibly need some adjustment.
This change allows subprocesses not to deadlock when calling
os.fork()
because it will reset the logger's internal state on forking (threads don't survive a fork, butself.queue[s].put
may be locked while forking). It does seem like boto3 sessions could also end up causing issues, but I'm not sure if the thread safety comment is actually an issue for this library.This is still not "perfect" because the documentation on
os._exit
suggests that it should be called when exiting a fork, which bypasses the logging module's default shutdown routine, which includes flushing the logs. If the documentation suggests that watchtower should be flushed before exiting a fork then I believe this change plus making sureflush
is called will ensure that the logs are delivered.Even without documentation suggesting to flush the logs, this change will at least make sure there new threads created to handle the reset queues. Otherwise forked copies of this library believe the threads to be still alive even though they are not. An alternative implementation could check if the thread for the queue is still alive, but that case would still need to handle what to do with the queued logs, and in the case of forking those logs are duplicates.