-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Live task backtraces from a running system #47141
Comments
Another nice touch from Go: when a thread is 'parked', i.e. taken off the run queue, the reason why it's waiting is stored, and this shows up in the 'backtrace dump'. The "wait reason" is one of these: https://github.com/golang/go/blob/1f068f0dc7bc997446a7aac44cfc70746ad918e0/src/runtime/runtime2.go#L1049-L1082 i.e. waiting for a lock, waiting for GC to stop the world, etc. Not sure what it would take for us to do this (perhaps worth filing another issue), but it'd be nice. |
Could we do this now entirely in julia, via a finalizer? This seems to work pretty well for me!: julia> function print_task_backtraces()
r = Ref(nothing)
finalizer(r) do _
@ccall jl_print_task_backtraces()::Cvoid
end
r = nothing
GC.gc()
end
print_task_backtraces (generic function with 1 method)
julia> print_task_backtraces()
==== Thread 1 created 3 live tasks
---- Root task (0x10aea4010)
(sticky: 1, started: 1, state: 0, tid: 1)
jl_rec_backtrace at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:863
jlbacktracet at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:1099 [inlined]
jl_print_task_backtraces at /Users/nathandaly/builds/julia-RAI/src/stackwalk.c:1128
#11 at ./REPL[16]:4
unknown function (ip: 0x10ab62b92)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
run_finalizer at /Users/nathandaly/builds/julia-RAI/src/gc.c:283
jl_gc_run_finalizers_in_list at /Users/nathandaly/builds/julia-RAI/src/gc.c:372
run_finalizers at /Users/nathandaly/builds/julia-RAI/src/gc.c:413
ijl_gc_collect at /Users/nathandaly/builds/julia-RAI/src/gc.c:3416
gc at ./gcutils.jl:93 [inlined]
gc at ./gcutils.jl:93 [inlined]
print_task_backtraces at ./REPL[18]:7
unknown function (ip: 0x10ab62c8f)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
do_call at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:126
eval_stmt_value at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:166 [inlined]
eval_body at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:594
jl_interpret_toplevel_thunk at /Users/nathandaly/builds/julia-RAI/src/interpreter.c:750
top-level scope at REPL[19]:1
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:906
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:850
jl_toplevel_eval_flex at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:850
ijl_toplevel_eval at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /Users/nathandaly/builds/julia-RAI/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_65547 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_59223 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
jl_f__call_latest at /Users/nathandaly/builds/julia-RAI/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:518
jfptr__start_25805 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
true_main at /Users/nathandaly/builds/julia-RAI/src/jlapi.c:575
jl_repl_entrypoint at /Users/nathandaly/builds/julia-RAI/src/jlapi.c:719
---- Task 1 (0x14b0c06a0)
(sticky: 1, started: 1, state: 0, tid: 1)
jl_swap_fiber at /Users/nathandaly/builds/julia-RAI/src/task.c:1040
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:477
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
wait at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:634
watch_file at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:748
watch_file at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/FileWatching/src/FileWatching.jl:740 [inlined]
wait_changed at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/packagedef.jl:34
watch_manifest at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/pkgs.jl:413
unknown function (ip: 0x10ab55142)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
do_apply at /Users/nathandaly/builds/julia-RAI/src/builtins.c:730
TaskThunk at /Users/nathandaly/work/jl_depots/raicode2/packages/Revise/do2nH/src/types.jl:268
unknown function (ip: 0x10ab50b5f)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
---- End task 1
---- Task 2 (0x14cd43730)
(sticky: 1, started: 1, state: 0, tid: 1)
jl_swap_fiber at /Users/nathandaly/builds/julia-RAI/src/task.c:1040
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:477
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
take_buffered at ./channels.jl:416
take! at ./channels.jl:410 [inlined]
eval_with_backend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:838
do_respond at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:852
jfptr_do_respond_65416 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
jl_f__call_latest at /Users/nathandaly/builds/julia-RAI/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_interface at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
run_frontend at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
#49 at ./task.jl:488
jfptr_YY.49_65139 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
---- End task 2
==== End thread 1
==== Thread 2 created 3 live tasks
---- Root task (0x10af90010)
(sticky: 1, started: 1, state: 1, tid: 2)
---- Task 1 (0x10aea42b0)
(sticky: 0, started: 1, state: 0, tid: 0)
jl_start_fiber_swap at /Users/nathandaly/builds/julia-RAI/src/task.c:1127
ctx_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:512
ijl_switch at /Users/nathandaly/builds/julia-RAI/src/task.c:551
try_yieldto at ./task.jl:865
wait at ./task.jl:939
wait at ./condition.jl:124
#138 at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:281 [inlined]
lock at ./lock.jl:185
lock at ./condition.jl:78 [inlined]
macro expansion at /Users/nathandaly/builds/julia-RAI/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:279 [inlined]
#137 at ./threadingconstructs.jl:321
jfptr_YY.137_61891 at /Users/nathandaly/builds/julia-RAI/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/builds/julia-RAI/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:2549
jl_apply at /Users/nathandaly/builds/julia-RAI/src/./julia.h:1842 [inlined]
start_task at /Users/nathandaly/builds/julia-RAI/src/task.c:932
---- End task 1
---- Task 2 (0x10aea4550)
(sticky: 0, started: 1, state: 0, tid: 2)
---- End task 2
==== End thread 2
==== Thread 3 created 1 live tasks
---- Root task (0x10af98010)
(sticky: 1, started: 1, state: 1, tid: 3)
==== End thread 3
==== Thread 4 created 1 live tasks
---- Root task (0x10af94010)
(sticky: 1, started: 1, state: 1, tid: 4)
==== End thread 4
==== Done
julia> |
finalizers do not run as part of GC, they run after |
most memory corruption is silent, so your example would work well for you for most of the time |
So do finalizers run during the safepoint? Or do they run after GC, when the threads have started again? |
From what I understood from Lines 3507 to 3516 in 321c5f5
|
Thanks @gbaraldi - i just found the same thing and was about to post it. Looks right, yeah. Too bad. thanks! |
Okay so then that won't work. Thanks. So doing this in C, would it look something like this?: JL_DLLEXPORT void jl_safe_print_tasks()
{
jl_task_t *ct = jl_current_task;
jl_ptls_t ptls = ct->ptls;
int8_t old_state = jl_atomic_load_relaxed(&ptls->gc_state);
jl_atomic_store_release(&ptls->gc_state, JL_GC_STATE_WAITING);
// TODO: How do we check the return value here?
// We need to be the ones "owning" this GC safepoint, since we don't want to do
// the task dump while another thread is running GC on one of the tasks.
jl_safepoint_start_gc()
jl_fence();
gc_n_threads = jl_atomic_load_acquire(&jl_n_threads);
gc_all_tls_states = jl_atomic_load_relaxed(&jl_all_tls_states);
jl_gc_wait_for_the_world(gc_all_tls_states, gc_n_threads);
jl_print_task_backtraces()
gc_n_threads = 0;
gc_all_tls_states = NULL;
jl_safepoint_end_gc();
jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
JL_PROBE_GC_END();
} It seems like it would be nice for us to expose something like the |
You can see here uint64_t t0 = jl_hrtime();
if (!jl_safepoint_start_gc()) {
// Multithread only. See assertion in `safepoint.c`
jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
return;
} Something like this, but we probably want to try again if we weren't the first thread to ask for a gc. |
#46845 introduces
jl_print_task_backtraces()
which is only safe to use from GDB, when all threads are stopped. We should enhance this to support getting backtraces from live threads as suggested by @vilterp.The text was updated successfully, but these errors were encountered: