Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-106238: Handle KeyboardInterrupt during logging._acquireLock() #106239

Merged
merged 3 commits into from
Jul 6, 2023

Conversation

arieleiz
Copy link
Contributor

@arieleiz arieleiz commented Jun 29, 2023

We've come across a concurrency bug in logging/init.py which involves the handling of asynchronous exceptions, such as KeyboardInterrupt, during the execution of logging._acquireLock().

In the current implementation, when threading.RLock.acquire() is executed, there is a possibility for an asynchronous exception to occur during the transition back from native code, even if the lock acquisition is successful.

The typical use of _acquireLock() in the logging library is as follows:

def _loggingMethod(handler):
    """
    Add a handler to the internal cleanup list using a weak reference.
    """
    _acquireLock()
    try:
        # doSomething
    finally:
        _releaseLock()

In this pattern, if a KeyboardInterrupt is raised during the lock acquisition, the lock ends up getting abandoned.

When can this happen? One example is during forks. logging/init.py registers an at-fork hook, with

os.register_at_fork(before=_acquireLock,
                    after_in_child=_after_at_fork_child_reinit_locks,
                    after_in_parent=_releaseLock)

A scenario occurring in our production environment is during a slow fork operation (when the server is under heavy load and performing a multitude of forks). The lock could be held for up to a minute. If this is happening in a secondary thread, and a SIGINT signal is received in the main thread while is waiting to acquire the lock for logging, the lock will be abandoned. This will causes the process to hang during the next _acquireLock() call.

To address this issue, we provide a simple pull request to add a try-except block within _acquireLock(), e.g.:

def _acquireLock():
    if _lock:
        try:
            _lock.acquire()
        except BaseException:
            _lock.release()
            raise

This way, if an exception arises during the lock acquisition, the lock will be released, preventing the lock from being abandoned and the process from potentially hanging.

We've come across a concurrency bug in logging/__init__.py which involves the handling of asynchronous exceptions, such as KeyboardInterrupt, during the execution of logging._acquireLock().

In the current implementation, when threading.RLock.acquire() is executed, there is a possibility for an asynchronous exception to occur during the transition back from native code, even if the lock acquisition is successful.

The typical use of _acquireLock() in the logging library is as follows:

def _loggingMethod(handler):
    """
    Add a handler to the internal cleanup list using a weak reference.
    """
    _acquireLock()
    try:
        # doSomething
    finally:
        _releaseLock()
In this pattern, if a KeyboardInterrupt is raised during the lock acquisition, the lock ends up getting abandoned.

When can this happen? One example is during forks. logging/__init__.py registers an at-fork hook, with

os.register_at_fork(before=_acquireLock,
                    after_in_child=_after_at_fork_child_reinit_locks,
                    after_in_parent=_releaseLock)
A scenario occurring in our production environment is during a slow fork operation (when the server is under heavy load and performing a multitude of forks). The lock could be held for up to a minute. If this is happening in a secondary thread, and a SIGINT signal is received in the main thread while is waiting to acquire the lock for logging, the lock will be abandoned. This will causes the process to hang during the next _acquireLock() call.

To address this issue, we provide a simple pull request to add a try-except block within _acquireLock(), e.g.:

def _acquireLock():
    if _lock:
        try:
            _lock.acquire()
        except BaseException:
            _lock.release()
            raise
This way, if an exception arises during the lock acquisition, the lock will be released, preventing the lock from being abandoned and the process from potentially hanging.
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Jun 29, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@AlexWaygood AlexWaygood changed the title issue-106238: Handle KeyboardInterrupt during logging._acquireLock() gh-106238: Handle KeyboardInterrupt during logging._acquireLock() Jun 29, 2023
@vsajip
Copy link
Member

vsajip commented Jun 30, 2023

I'm not sure this is the right place to fix this. See the long discussion on #50970. Anyway, wouldn't your proposed fix be better arranged to move the lock.acquire() to inside the try:? That way, the finally: would release the lock.

@arieleiz
Copy link
Contributor Author

Thank you very much for pointing me to #50970. It is not the same issue but similar, as we encounter the issue in the original process, not the forked process, and the issue is not the state of the lock but the lock being abandoned due to an async exception.
Another possibility for the fix is directly within the lock acquire native code, detecting that an exception has occured and unlocking. However this will have a much greater scope as it will affect all locks.
Regarding your suggestion to move _acquireLock() inside the try: except: clauses, this will not fix the lock acquire in atfork(), and also might be more prone to errors in future changes to this code.
With our proposed fix we maintain the invariant and underlying assumption of the library that when an exception is thrown from within _acquireLock() the lock is never abandoned.

@vsajip
Copy link
Member

vsajip commented Jul 4, 2023

@arieleiz You need to sign the CLA before this PR can be progressed.

@arieleiz
Copy link
Contributor Author

arieleiz commented Jul 4, 2023

Hi @vsajip, I've tried signing the CLA multiple times, I keep getting a "Internal Server Error" from CLAbot after "Sign here with github to agree". Any idea what I am doing wrong?

@vsajip
Copy link
Member

vsajip commented Jul 5, 2023

I'm afraid not. Might be worth asking on https://discuss.python.org/ to see if anyone there can help.

@vsajip vsajip added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 5, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @vsajip for commit 99b651d 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 5, 2023
@vsajip
Copy link
Member

vsajip commented Jul 5, 2023

CLA signing status appears to be stuck on "not signed". Will convert to draft and back to see if it unsticks.

@vsajip vsajip marked this pull request as draft July 5, 2023 21:55
@vsajip vsajip marked this pull request as ready for review July 5, 2023 21:56
@vsajip
Copy link
Member

vsajip commented Jul 6, 2023

CLA signing status appears to be stuck on "not signed". Will convert to draft and back to see if it unsticks.

That didn't work - closing and reopening to see if that does the trick.

@vsajip vsajip closed this Jul 6, 2023
@vsajip vsajip reopened this Jul 6, 2023
@arieleiz
Copy link
Contributor Author

arieleiz commented Jul 6, 2023

Thanks @vsajip! shows up as CLA signed for me

@vsajip vsajip merged commit 99b00ef into python:main Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants