Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in jemalloc when a signal occurs in malloc and hhvm bt_handler() tries to malloc again #4533

Closed
wjywbs opened this issue Dec 26, 2014 · 2 comments
Labels

Comments

@wjywbs
Copy link
Contributor

wjywbs commented Dec 26, 2014

I added a fopen() in Logger::log() if the output file is closed. (I'm not sure if this is related to this issue.) This doesn't happen very often, but all threads deadlock in jemalloc.

#0  0x00007fcc372ea264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fcc372e5508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fcc372e53d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fcc3a8488dd in je_malloc_mutex_lock (mutex=0x7fcc32800bf8)
    at include/jemalloc/internal/mutex.h:77
#4  je_arena_tcache_fill_small (arena=0x7fcc328000c0, 
    tbin=tbin@entry=0x7fcc10228248, binind=binind@entry=17, 
    prof_accumbytes=prof_accumbytes@entry=0) at src/arena.c:1475
#5  0x00007fcc3a86f8cf in je_tcache_alloc_small_hard (
    tcache=tcache@entry=0x7fcc10228000, tbin=tbin@entry=0x7fcc10228248, 
    binind=binind@entry=17) at src/tcache.c:72
#6  0x00007fcc3a83e5c5 in je_tcache_alloc_small (zero=false, size=640, 
    tcache=0x7fcc10228000) at include/jemalloc/internal/tcache.h:303
#7  je_arena_malloc (try_tcache=true, zero=false, size=<optimized out>, 
    arena=0x0) at include/jemalloc/internal/arena.h:957
#8  je_imalloct (arena=0x0, try_tcache=true, size=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:771
#9  je_imalloc (size=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:780
#10 malloc (size=<optimized out>) at src/jemalloc.c:929
#11 0x00007fcc365cbecb in __fopen_internal () from /lib64/libc.so.6
#12 0x0000000001a1130b in real_fopen (
    filename=filename@entry=0x7fff4455f773 "hhvm", modes=<optimized out>)
    at bfdio.c:120
#13 0x000000000199a708 in bfd_fopen (filename=0x7fff4455f773 "hhvm", 
    target=0x0, mode=0x20f18c5 "r", fd=-1) at opncls.c:219
#14 0x00000000016af7a5 in HPHP::fill_bfd_cache(char const*, HPHP::bfd_cache*)
    ()
#15 0x00000000016b14f2 in HPHP::StackTraceBase::Addr2line(char const*, char const*, HPHP::StackTraceBase::Frame*, void*, void*, unsigned int) ()
#16 0x00000000016b21dc in HPHP::StackTraceBase::Translate(void*, HPHP::StackTraceBase::Frame*, Dl_info&, void*, void*, unsigned int) ()
#17 0x00000000016b22a5 in HPHP::StackTraceNoHeap::Translate(int, void*, int, void*, unsigned int) ()
#18 0x00000000016b2516 in HPHP::StackTraceNoHeap::printStackTrace(int) const ()
#19 0x00000000016b27a2 in HPHP::StackTraceNoHeap::log(char const*, char const*, char const*, char const*, int) const ()
#20 0x0000000000a36e72 in HPHP::bt_handler(int) ()
#21 <signal handler called>
#22 je_bitmap_set (bit=18446744073709551615, 
    binfo=0x7fcc3aa7e9d0 <je_arena_bin_info+1808>, bitmap=0x7fcbc125b010)
    at include/jemalloc/internal/bitmap.h:105
#23 je_bitmap_sfu (binfo=0x7fcc3aa7e9d0 <je_arena_bin_info+1808>, 
    bitmap=0x7fcbc125b010) at include/jemalloc/internal/bitmap.h:140
#24 arena_run_reg_alloc (bin_info=0x7fcc3aa7e9a8 <je_arena_bin_info+1768>, 
    run=0x7fcbc125b000) at src/arena.c:291
#25 je_arena_tcache_fill_small (arena=0x7fcc328000c0, 
    tbin=tbin@entry=0x7fcc10228248, binind=binind@entry=17, 
    prof_accumbytes=prof_accumbytes@entry=0) at src/arena.c:1479
#26 0x00007fcc3a86f8cf in je_tcache_alloc_small_hard (
    tcache=tcache@entry=0x7fcc10228000, tbin=tbin@entry=0x7fcc10228248, 
    binind=binind@entry=17) at src/tcache.c:72
#27 0x00007fcc3a83e5c5 in je_tcache_alloc_small (zero=false, size=640, 
    tcache=0x7fcc10228000) at include/jemalloc/internal/tcache.h:303
#28 je_arena_malloc (try_tcache=true, zero=false, size=<optimized out>, 
    arena=0x0) at include/jemalloc/internal/arena.h:957
#29 je_imalloct (arena=0x0, try_tcache=true, size=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:771
#30 je_imalloc (size=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:780
#31 malloc (size=<optimized out>) at src/jemalloc.c:929
#32 0x00007fcc365cbecb in __fopen_internal () from /lib64/libc.so.6
#33 0x00000000016c9ef2 in HPHP::Logger::Error(std::string const&) ()
#34 0x000000000127cff7 in HPHP::f_error_log(HPHP::String const&, int, HPHP::String const&, HPHP::String const&) ()

So a signal occurred in malloc() with its lock locked, and hhvm tried to print the stack trace. fill_bfd_cache() called bfd_openr(), which tried to fopen() again and malloc tried to acquire lock again.

// In hphp/util/stack-trace.cpp, fill_bfd_cache()
bfd *abfd = bfd_openr(filename, nullptr); // hard to avoid heap here!

I have several questions:

  • Should I add a VMRegAnchor _; in HHVM_FUNCTION(error_log)? I see that it does not use an anchor when writing to a destination file.
  • Should the stack trace handler preload some symbols when starting hhvm, at least hhvm itself? Or can it detect if there is a memory deadlock?
  • I am tracing that the main log file may be closed when ini_set('log_errors', 0) in php, or ini_set('error_log', 'file') (handled by hphp/runtime/base/request-injection-data.cpp). Should this only affect the current script, instead of closing/replacing the main log file?

Thanks.

@paulbiss
Copy link
Contributor

paulbiss commented Jan 5, 2015

Looks like it's hanging while trying to generate the stacktrace, I'm pretty sure StackTraceNoHeap calls malloc so that's not surprising.

@paulbiss paulbiss added the crash label Jan 5, 2015
wjywbs added a commit to wjywbs/hhvm that referenced this issue Jan 26, 2015
…oid deadlock.

jemalloc is most likely crashed and acquired its lock if it's found in the
stacktrace. Be conservative and avoid malloc again and deadlock.

Part of facebook#4533 and facebook#2408.
wjywbs added a commit to wjywbs/hhvm that referenced this issue Feb 12, 2015
…andler.

This specifies the maximum number of seconds spent for generating a stack trace
when hhvm is crashed. The default is 0 which means no timeout. This can be set
to prevent from deadlocks in the backtrace handler.

Part of facebook#4533 and facebook#2408.
hhvm-bot pushed a commit that referenced this issue Feb 14, 2015
…andler.

Summary: This specifies the maximum number of seconds spent for generating a stack trace
when hhvm is crashed. The default is 0 which means no timeout. This can be set
to prevent from deadlocks in the backtrace handler.

Part of #4533 and #2408.
Closes #4828

Reviewed By: @markw65

Differential Revision: D1844818

Pulled By: @fredemmott
@wjywbs
Copy link
Contributor Author

wjywbs commented Aug 26, 2015

Fixed by d43f26b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants