Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live task backtraces from a running system #47141

Closed
kpamnany opened this issue Oct 12, 2022 · 11 comments
Closed

Live task backtraces from a running system #47141

kpamnany opened this issue Oct 12, 2022 · 11 comments
Labels
feature Indicates new feature / enhancement requests

Comments

@kpamnany
Copy link
Contributor

#46845 introduces jl_print_task_backtraces() which is only safe to use from GDB, when all threads are stopped. We should enhance this to support getting backtraces from live threads as suggested by @vilterp.

@vilterp
Copy link
Contributor

vilterp commented Oct 19, 2022

Another nice touch from Go: when a thread is 'parked', i.e. taken off the run queue, the reason why it's waiting is stored, and this shows up in the 'backtrace dump'.

The "wait reason" is one of these: https://github.com/golang/go/blob/1f068f0dc7bc997446a7aac44cfc70746ad918e0/src/runtime/runtime2.go#L1049-L1082

i.e. waiting for a lock, waiting for GC to stop the world, etc. Not sure what it would take for us to do this (perhaps worth filing another issue), but it'd be nice.

@brenhinkeller brenhinkeller added the feature Indicates new feature / enhancement requests label Nov 21, 2022
@NHDaly NHDaly changed the title Live task backtraces Live task backtraces from a running system Dec 8, 2022
@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

Could we do this now entirely in julia, via a finalizer?

This seems to work pretty well for me!:

julia> function print_task_backtraces()
           r = Ref(nothing)
           finalizer(r) do _
               @ccall jl_print_task_backtraces()::Cvoid
           end
           r = nothing
           GC.gc()
       end
print_task_backtraces (generic function with 1 method)

julia> print_task_backtraces()
==== Thread 1 created 3 live tasks
     ---- Root task (0x10aea4010)
          (sticky: 1, started: 1, state: 0, tid: 1)
jl_rec_backtrace at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:863
jlbacktracet at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:1099 [inlined]
jl_print_task_backtraces at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:1128
#11 at ./REPL[16]:4
unknown function (ip: 0x10ab62b92)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
run_finalizer at /Users/nathandaly/builds/julia-RAI/src/gc.c:283
jl_gc_run_finalizers_in_list at /Users/nathandaly/builds/julia-RAI/src/gc.c:372
run_finalizers at /Users/nathandaly/builds/julia-RAI/src/gc.c:413
ijl_gc_collect at /Users/nathandaly/builds/julia-RAI/src/gc.c:3416
gc at ./gcutils.jl:93 [inlined]
gc at ./gcutils.jl:93 [inlined]
print_task_backtraces at ./REPL[18]:7
unknown function (ip: 0x10ab62c8f)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
do_call at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:126
eval_stmt_value at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:166 [inlined]
eval_body at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:594
jl_interpret_toplevel_thunk at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:750
top-level scope at REPL[19]:1
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:906
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:850
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:850
ijl_toplevel_eval at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_65547 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_59223 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
jl_f__call_latest at /Users/nathandaly/builds/julia-RAI/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:518
jfptr__start_25805 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
true_main at /Users/nathandaly/builds/julia-RAI/src/jlapi.c:575
jl_repl_entrypoint at /Users/nathandaly/builds/julia-RAI/src/jlapi.c:719
     ---- Task 1 (0x14b0c06a0)
          (sticky: 1, started: 1, state: 0, tid: 1)
jl_swap_fiber at /Users/nathandaly/builds/julia-RAI/src/task.c:1040
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:477
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
wait at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:634
watch_file at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:748
watch_file at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:740 [inlined]
wait_changed at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/packagedef.jl:34
watch_manifest at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/pkgs.jl:413
unknown function (ip: 0x10ab55142)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
do_apply at /Users/nathandaly/builds/julia-RAI/src/builtins.c:730
TaskThunk at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/types.jl:268
unknown function (ip: 0x10ab50b5f)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
     ---- End task 1
     ---- Task 2 (0x14cd43730)
          (sticky: 1, started: 1, state: 0, tid: 1)
jl_swap_fiber at /Users/nathandaly/builds/julia-RAI/src/task.c:1040
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:477
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
take_buffered at ./channels.jl:416
take! at ./channels.jl:410 [inlined]
eval_with_backend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:838
do_respond at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:852
jfptr_do_respond_65416 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
jl_f__call_latest at /Users/nathandaly/builds/julia-RAI/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_interface at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
run_frontend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
#49 at ./task.jl:488
jfptr_YY.49_65139 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
     ---- End task 2
==== End thread 1
==== Thread 2 created 3 live tasks
     ---- Root task (0x10af90010)
          (sticky: 1, started: 1, state: 1, tid: 2)
     ---- Task 1 (0x10aea42b0)
          (sticky: 0, started: 1, state: 0, tid: 0)
jl_start_fiber_swap at /Users/nathandaly/builds/julia-RAI/src/task.c:1127
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:512
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
#138 at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:281 [inlined]
lock at ./lock.jl:185
lock at ./condition.jl:78 [inlined]
macro expansion at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:279 [inlined]
#137 at ./threadingconstructs.jl:321
jfptr_YY.137_61891 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
     ---- End task 1
     ---- Task 2 (0x10aea4550)
          (sticky: 0, started: 1, state: 0, tid: 2)
     ---- End task 2
==== End thread 2
==== Thread 3 created 1 live tasks
     ---- Root task (0x10af98010)
          (sticky: 1, started: 1, state: 1, tid: 3)
==== End thread 3
==== Thread 4 created 1 live tasks
     ---- Root task (0x10af94010)
          (sticky: 1, started: 1, state: 1, tid: 4)
==== End thread 4
==== Done

julia>

@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

CC: @nickrobinson251 @msagarpatel

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2023

finalizers do not run as part of GC, they run after

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2023

most memory corruption is silent, so your example would work well for you for most of the time

@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

So do finalizers run during the safepoint? Or do they run after GC, when the threads have started again?

@gbaraldi
Copy link
Member

gbaraldi commented Jan 5, 2023

From what I understood from

julia/src/gc.c

Lines 3507 to 3516 in 321c5f5

jl_safepoint_end_gc();
jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
JL_PROBE_GC_END();
// Only disable finalizers on current thread
// Doing this on all threads is racy (it's impossible to check
// or wait for finalizers on other threads without dead lock).
if (!ptls->finalizers_inhibited && ptls->locks.len == 0) {
run_finalizers(ct);
}
we run after leaving the safepoints

@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

Thanks @gbaraldi - i just found the same thing and was about to post it. Looks right, yeah. Too bad. thanks!

@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

Okay so then that won't work. Thanks. So doing this in C, would it look something like this?:

JL_DLLEXPORT void jl_safe_print_tasks()
{
    jl_task_t *ct = jl_current_task;
    jl_ptls_t ptls = ct->ptls;

    int8_t old_state = jl_atomic_load_relaxed(&ptls->gc_state);
    jl_atomic_store_release(&ptls->gc_state, JL_GC_STATE_WAITING);

    // TODO: How do we check the return value here?
    //   We need to be the ones "owning" this GC safepoint, since we don't want to do
    //   the task dump while another thread is running GC on one of the tasks.
    jl_safepoint_start_gc()

    jl_fence();
    gc_n_threads = jl_atomic_load_acquire(&jl_n_threads);
    gc_all_tls_states = jl_atomic_load_relaxed(&jl_all_tls_states);
    jl_gc_wait_for_the_world(gc_all_tls_states, gc_n_threads);

    jl_print_task_backtraces()

    gc_n_threads = 0;
    gc_all_tls_states = NULL;
    jl_safepoint_end_gc();
    jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
    JL_PROBE_GC_END();
}

It seems like it would be nice for us to expose something like the stopTheWorld("stack trace") and startTheWorld() functions that golang provides. Here's their implementation of this feature, which is entirely written in go:
https://github.com/golang/go/blob/6d8ec893039a39f495c8139012e47754e4518b70/src/runtime/mprof.go#L1193-L1196

@gbaraldi
Copy link
Member

gbaraldi commented Jan 5, 2023

You can see here

    uint64_t t0 = jl_hrtime();
    if (!jl_safepoint_start_gc()) {
        // Multithread only. See assertion in `safepoint.c`
        jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
        return;
    }

Something like this, but we probably want to try again if we weren't the first thread to ask for a gc.

@NHDaly
Copy link
Member

NHDaly commented Jan 18, 2023

^ @kpamnany

@vtjnash vtjnash closed this as completed Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

6 participants