Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlocked child process after forking on pystate.c's head_mutex #74580

Closed
lbrandy mannequin opened this issue May 17, 2017 · 10 comments
Closed

deadlocked child process after forking on pystate.c's head_mutex #74580

lbrandy mannequin opened this issue May 17, 2017 · 10 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@lbrandy
Copy link
Mannequin

lbrandy mannequin commented May 17, 2017

BPO 30395
Nosy @gpshead, @vstinner, @ambv, @cooperlees, @lbrandy, @phantez
PRs
  • bpo-30395: _PyGILState_Reinit deadlock fix #1734
  • [3.6] bpo-30395 _PyGILState_Reinit deadlock fix (GH-1734) #1740
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-06-09.21:23:13.988>
    created_at = <Date 2017-05-17.23:25:58.616>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = "deadlocked child process after forking on pystate.c's head_mutex"
    updated_at = <Date 2017-06-09.21:23:13.986>
    user = 'https://github.com/lbrandy'

    bugs.python.org fields:

    activity = <Date 2017-06-09.21:23:13.986>
    actor = 'lukasz.langa'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-06-09.21:23:13.988>
    closer = 'lukasz.langa'
    components = ['Interpreter Core']
    creation = <Date 2017-05-17.23:25:58.616>
    creator = 'Louis Brandy'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30395
    keywords = []
    message_count = 10.0
    messages = ['293906', '294165', '294205', '294222', '294256', '294316', '294326', '294331', '294366', '295582']
    nosy_count = 9.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'pixelbeat', 'lukasz.langa', 'cooperlees', 'Louis Brandy', 'drothera', 'phantez', 'Piotr Wysocki']
    pr_nums = ['1734', '1740']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30395'
    versions = ['Python 3.6', 'Python 3.7']

    @lbrandy
    Copy link
    Mannequin Author

    lbrandy mannequin commented May 17, 2017

    A forked process (via os.fork) can inherit a locked head_mutex from its parent process and will promptly deadlock in this stack (on my linux box):

    Child Process (deadlocked):

    #0 0x00007f1a4da82e3c in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f1a4c2964e0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    #1 0x00007f1a4da82e3c in do_futex_wait (sem=sem@entry=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:111
    #2 0x00007f1a4da82efc in __new_sem_wait_slow (sem=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:181
    #3 0x00007f1a4da82fb9 in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
    #4 0x00007f1a4de4c605 in PyThread_acquire_lock_timed (lock=0x7f1a4c2964e0, microseconds=-1, intr_flag=0) at Python/thread_pthread.h:352
    #5 0x00007f1a4de4c4b4 in PyThread_acquire_lock (lock=<optimized out>, waitflag=waitflag@entry=1) at Python/thread_pthread.h:556
    #6 0x00007f1a4dd59e08 in _PyThreadState_DeleteExcept (tstate=0x7f19f2301800) at Python/pystate.c:483
    #7 0x00007f1a4dd46af4 in PyEval_ReInitThreads () at Python/ceval.c:326
    #8 0x00007f1a4dd78b0b in PyOS_AfterFork () at ./Modules/signalmodule.c:1608

    The parent process has a race between one thread calling os.fork (and holding the GIL) and another (in our case C++) thread trying to use PyGILState_Ensure. PyGILState_Ensure will grab the head_mutex before it tries to get the GIL. So if a fork happens in this critical section, the child process will get the locked head_mutex.

    The path from PyGILState_Ensure -> head_mutex looks like this:

    #0 new_threadstate (interp=0x7fb5fd483d80, init=init@entry=1) at Python/pystate.c:183
    #1 0x00007fb5ff149027 in PyThreadState_New (interp=<optimized out>) at Python/pystate.c:250
    #2 0x00007fb5ff006ac7 in PyGILState_Ensure () at Python/pystate.c:838

    ----

    Possible fix?

    A simple fix would be to, inside PyOS_AfterFork, reset/unlock pystate.c's head_mutex if it's already locked.

    Unclear if this is related to: https://bugs.python.org/issue28812

    @lbrandy lbrandy mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 17, 2017
    @gpshead
    Copy link
    Member

    gpshead commented May 22, 2017

    You cannot safely use Python's os.fork() in a process that has threads. Because of POSIX.

    The CPython interpreter is by definition not async signal safe so no Python code may safely be executed after the fork() system call if the process contained _any_ threads at fork() system call time.

    The only reliable recommendation: *Never use os.fork()* (if your process _might ever_ contain any threads now or in the future started by your or any of your dependencies).

    The closest thing to a fix to this is a bunch of band-aids to setup atfork functions to clear the state of known locks in the forked child. -- But even that can't guarantee things because that can't know about all locks.

    The C standard library can contain locks. malloc() for example. Any C/C++ extension modules can contain locks of their own. The problem isn't solvable within CPython. Adding a clear of pystate.c's head_mutex after forking makes sense. That may even get you further. But you will encounter other problems of the same nature in the future.

    related: https://bugs.python.org/issue6721 and https://bugs.python.org/issue16500

    @gpshead gpshead added 3.7 (EOL) end of life invalid type-feature A feature request or enhancement labels May 22, 2017
    @ambv
    Copy link
    Contributor

    ambv commented May 22, 2017

    New changeset f82c951 by Łukasz Langa (Jason Fried) in branch 'master':
    bpo-30395 _PyGILState_Reinit deadlock fix (bpo-1734)
    f82c951

    @ambv
    Copy link
    Contributor

    ambv commented May 23, 2017

    New changeset d29fecc by Łukasz Langa in branch '3.6':
    [3.6] bpo-30395 _PyGILState_Reinit deadlock fix (GH-1734) (bpo-1740)
    d29fecc

    @lbrandy
    Copy link
    Mannequin Author

    lbrandy mannequin commented May 23, 2017

    Thanks to everyone jumping in.

    I need no convincing that mixing forks and threads isn't just a problem but a problem factory. Given that the rest of this code seems to try to avoid similar deadlocks with similar mutexes, I figured we'd want to include this mutex to make a best-effort at being safe here. That made it worth reporting. To be sure, I still believe that the application code that led us here needs deeper fixes to address the fork/thread problems.

    @vstinner
    Copy link
    Member

    + head_mutex = NULL;

    Shouldn't we free memory here?

    @gpshead
    Copy link
    Member

    gpshead commented May 24, 2017

    Would PyThread_free_lock (effectively sem_destroy()) work without (additional) problems?

    @vstinner
    Copy link
    Member

    Gregory P. Smith added the comment:

    Would PyThread_free_lock (effectively sem_destroy()) work without
    (additional) problems?

    If I recall correctly, no, you can get issues if the lock is still
    acquired... But I should check.

    @gpshead
    Copy link
    Member

    gpshead commented May 24, 2017

    Another alternative *might* be to check if the lock is locked (non-blocking acquire?) and release it if so. Under the normal assumption that we are the only thread running immediately post-fork().

    I'm not sure that can be guaranteed reliable given that other C code could've used pthread_atfork to register an "after" fork handler that spawns threads. But that should be rare, and nothing here can really fix the underlying "programmer has mixed fork and threads" issue. Only ameliorate it.

    But i'm not sure this post fork memory leak is really a problem other than potentially showing up in memory sanatizer runs involving forked children. The scenario where it would be an actual leak is if a process does serial fork() calls with the parent(s) dying rather than forking new children from a common parent. That would grow the leak as each child would have an additional lock allocated (a few bytes). I don't _believe_ that kind of process pattern is common (but people do everything).

    @ambv
    Copy link
    Contributor

    ambv commented Jun 9, 2017

    If you'd like to fix the miniscule leak this introduces, feel free but I don't think it's worth the additional complexity. Closing this for now as it solved an issue for us internally and we haven't observed any memory-related issues due to it.

    @ambv ambv closed this as completed Jun 9, 2017
    @ambv ambv removed the invalid label Jun 9, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants