Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-6721: Sanitize logging locks while forking #4071

Merged
merged 7 commits into from
Sep 14, 2018

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Oct 21, 2017

This part is simple, acquire and release the logging lock and each logging handler's lock around fork.

A unittest is still needed (relatively trivial to create).

The logging.handlers module has a QueueHandler and QueueListener which uses the queue module... that is full of locks. I'm not addressing that as I don't see an immediately obvious correct thing to do with that queue. A QueueListener thread won't be running in the forked child anyways so the right thing for anyone using one of those is likely to remove the QueueHandler from their logging config entirely... unless they go on to re-establish their complicated queue based logging handler setup in the child. suggestion: Do nothing. Let people using QueueHandler deal with their own problems as their application sees fit.

https://bugs.python.org/issue6721

@@ -793,6 +798,9 @@ def createLock(self):
Acquire a thread lock for serializing access to the underlying I/O.
"""
self.lock = threading.RLock()
os.register_at_fork(before=self.acquire,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may register an arbitrary number of callbacks across an application's lifetime, also it makes the handles eternal by keeping a reference to all of them. Instead, we could have a global WeakSet of handlers and a single set of callbacks that iterate over the live handlers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be a before=lambda acquire=weakref.WeakMethod(self.acquire): acquire()()... I'm used to applications that don't go creating tons of logging.Handler instances and throwing them away all over the place. :)

I think I'll go with the WeakSet approach to avoid an every growing list of callables in that strange Handler discarding application case...

@ambv
Copy link
Contributor

ambv commented Aug 31, 2018

@gpshead Please come back to this patch during the sprint!

@gpshead gpshead self-assigned this Sep 12, 2018
@gpshead gpshead force-pushed the logging_locks_at_fork branch 2 times, most recently from 8d51698 to caa3e6f Compare September 13, 2018 08:41
@gpshead gpshead force-pushed the logging_locks_at_fork branch from caa3e6f to a0c2cfb Compare September 13, 2018 17:03
@gpshead gpshead requested a review from ambv September 13, 2018 17:10
Now it fails on the previous code and is fixed by the logging changes.
The test now uses a thread to hold the locks during the fork in a
synchronized manner.

Due to mixing of fork and a thread there is potential for deadlock
in the child process.  buildbots and time will tell if this actually
manifests itself in this test or not.  :/
@gpshead
Copy link
Member Author

gpshead commented Sep 13, 2018

A regression test that actually catches the bug and is fixed by this change has been added.

Due to mixing of fork and a thread there is potential for deadlock
in the child process. buildbots and time will tell if this actually
manifests itself in this test or not. :/

@gpshead gpshead merged commit 1900384 into python:master Sep 14, 2018
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@gpshead gpshead deleted the logging_locks_at_fork branch September 14, 2018 05:08
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 14, 2018
bpo-6721: When os.fork() was called while another thread holds a logging lock, the child process may deadlock when it tries to log.  This fixes that by acquiring all logging locks before fork and releasing them afterwards.

A regression test that fails before this change is included.

Within the new unittest itself: There is a small _potential_ due to mixing of fork and a thread in the child process if the parent's thread happened to hold a non-reentrant library call lock (malloc?) when the os.fork() happens.  buildbots and time will tell if this actually manifests itself in this test or not.  :/  A functionality test that avoids that would be a challenge.

An alternate test that isn't trying to produce the deadlock itself but just checking that the release and acquire calls are made would be the next best alternative if so.
(cherry picked from commit 1900384)

Co-authored-by: Gregory P. Smith <[email protected]>
@bedevere-bot
Copy link

GH-9291 is a backport of this pull request to the 3.7 branch.

gpshead pushed a commit that referenced this pull request Oct 7, 2018
bpo-6721: When os.fork() was called while another thread holds a logging lock, the child process may deadlock when it tries to log.  This fixes that by acquiring all logging locks before fork and releasing them afterwards.

A regression test that fails before this change is included.

Within the new unittest itself: There is a small _potential_ due to mixing of fork and a thread in the child process if the parent's thread happened to hold a non-reentrant library call lock (malloc?) when the os.fork() happens.  buildbots and time will tell if this actually manifests itself in this test or not.  :/  A functionality test that avoids that would be a challenge.

An alternate test that isn't trying to produce the deadlock itself but just checking that the release and acquire calls are made would be the next best alternative if so.
(cherry picked from commit 1900384)

Co-authored-by: Gregory P. Smith <[email protected]> [Google]
@hroncok
Copy link
Contributor

hroncok commented Nov 6, 2018

Just a heads up: We have reverted this commit in Fedora's Python package (version 3.7.1), because it rendered our graphical installers not bootable. This may have just exposed some bug in the installer or it may have introduced a regression. We are currently investigating the problem with Fedora QA and the installer developers and we will open a BPO issue if we find out a simple reproducer.

Details: https://bugzilla.redhat.com/show_bug.cgi?id=1644936

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants