Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible intermittent memleak #1641

Closed
chu11 opened this issue Aug 31, 2018 · 4 comments
Closed

possible intermittent memleak #1641

chu11 opened this issue Aug 31, 2018 · 4 comments

Comments

@chu11
Copy link
Member

chu11 commented Aug 31, 2018

In #1639, hit this valgrind failure. Hit a similar one in #1564 as well (#1564 has different paths due to code changes).

What's interesting is that the leaks listed here are components of a data structure. So there should be way more leaks than what is shown here if those data structures aren't being destroyed properly.

AFAICT, the "destroy" functions of these data structures is correct and I can't find a error-path corner case that could lead to these leaking.

�[31mFound ./t/t5000-valgrind.output�[39m
expecting success: 
	libtool e flux ${VALGRIND} \
		--tool=memcheck \
		--leak-check=full \
		--gen-suppressions=all \
		--trace-children=no \
		--child-silent-after-fork=yes \
		--num-callers=30 \
		--leak-resolution=med \
		--error-exitcode=1 \
		--suppressions=$VALGRIND_SUPPRESSIONS \
		${BROKER} --shutdown-grace=4 ${VALGRIND_WORKLOAD} 10

==12893== Memcheck, a memory error detector
==12893== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==12893== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==12893== Command: /home/travis/build/flux-framework/flux-core/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/t/valgrind/valgrind-workload.sh 10
==12893== 
FLUX_URI=local:///tmp/flux-MivVCS
2018-08-31T19:35:47.995037Z connector-local.err[0]: send kvs.watch response to client 442F0: Broken pipe
==12893== 
==12893== HEAP SUMMARY:
==12893==     in use at exit: 130,944 bytes in 155 blocks
==12893==   total heap usage: 940,255 allocs, 940,100 frees, 79,491,791 bytes allocated
==12893== 
==12893== 8,620 (64 direct, 8,556 indirect) bytes in 1 blocks are definitely lost in loss record 71 of 74
==12893==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12893==    by 0x4E5D92C: then_context_create (future.c:165)
==12893==    by 0x4E5D92C: flux_future_then (future.c:537)
==12893==    by 0x8D026E7: spawn_exec_handler (job.c:712)
==12893==    by 0x8D026E7: runevent_continuation (job.c:762)
==12893==    by 0x4E7B2F4: ev_invoke_pending (ev.c:3314)
==12893==    by 0x4E7E4B6: ev_run (ev.c:3717)
==12893==    by 0x4E523A2: flux_reactor_run (reactor.c:132)
==12893==    by 0x8D02B66: mod_main (job.c:943)
==12893==    by 0x40A978: module_thread (module.c:158)
==12893==    by 0x59D6183: start_thread (pthread_create.c:312)
==12893==    by 0x5EEE03C: clone (clone.S:111)
==12893== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:calloc
   fun:then_context_create
   fun:flux_future_then
   fun:spawn_exec_handler
   fun:runevent_continuation
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==12893== 10,964 bytes in 1 blocks are definitely lost in loss record 72 of 74
==12893==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12893==    by 0x5583A10: zmq::msg_t::init_size(unsigned long) (msg.cpp:77)
==12893==    by 0x52DA75C: zframe_new (zframe.c:52)
==12893==    by 0x52E4A85: zmsg_addmem (zmsg.c:293)
==12893==    by 0x4E57D21: flux_msg_copy (message.c:1324)
==12893==    by 0x8D01846: job_create_cb (job.c:423)
==12893==    by 0x4E53193: call_handler (msg_handler.c:302)
==12893==    by 0x4E5330E: dispatch_message (msg_handler.c:326)
==12893==    by 0x4E5330E: handle_cb (msg_handler.c:392)
==12893==    by 0x4E7B2F4: ev_invoke_pending (ev.c:3314)
==12893==    by 0x4E7E4B6: ev_run (ev.c:3717)
==12893==    by 0x4E523A2: flux_reactor_run (reactor.c:132)
==12893==    by 0x8D02B66: mod_main (job.c:943)
==12893==    by 0x40A978: module_thread (module.c:158)
==12893==    by 0x59D6183: start_thread (pthread_create.c:312)
==12893==    by 0x5EEE03C: clone (clone.S:111)
==12893== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:_ZN3zmq5msg_t9init_sizeEm
   fun:zframe_new
   fun:zmsg_addmem
   fun:flux_msg_copy
   fun:job_create_cb
   fun:call_handler
   fun:dispatch_message
   fun:handle_cb
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==12893== LEAK SUMMARY:
==12893==    definitely lost: 11,028 bytes in 2 blocks
==12893==    indirectly lost: 8,548 bytes in 43 blocks
==12893==      possibly lost: 0 bytes in 0 blocks
==12893==    still reachable: 110,848 bytes in 108 blocks
==12893==         suppressed: 520 bytes in 2 blocks
==12893== Reachable blocks (those to which a pointer was found) are not shown.
==12893== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12893== 
==12893== For counts of detected and suppressed errors, rerun with: -v
==12893== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
not ok 1 - valgrind reports no new errors on single broker run

# failed 1 among 1 test(s)
1..1
@chu11
Copy link
Member Author

chu11 commented Sep 4, 2018

Similar memleak seen in #1645, maybe will provide a clue. The flux_msg_copy() leak seems the same, but the flux_future_then() leak has been replaced with a flux_rpc_pack() leak. But both originate through the same function spawn_exec_handler()

�[31mFound ./t/t5000-valgrind.output�[39m
expecting success: 
	libtool e flux ${VALGRIND} \
		--tool=memcheck \
		--leak-check=full \
		--gen-suppressions=all \
		--trace-children=no \
		--child-silent-after-fork=yes \
		--num-callers=30 \
		--leak-resolution=med \
		--error-exitcode=1 \
		--suppressions=$VALGRIND_SUPPRESSIONS \
		${BROKER} --shutdown-grace=4 ${VALGRIND_WORKLOAD} 10

==12319== Memcheck, a memory error detector
==12319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==12319== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==12319== Command: /home/travis/build/flux-framework/flux-core/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/t/valgrind/valgrind-workload.sh 10
==12319== 
FLUX_URI=local:///tmp/flux-jR0sGh
==12319== 
==12319== HEAP SUMMARY:
==12319==     in use at exit: 130,947 bytes in 155 blocks
==12319==   total heap usage: 936,250 allocs, 936,095 frees, 79,304,238 bytes allocated
==12319== 
==12319== 8,620 (88 direct, 8,532 indirect) bytes in 1 blocks are definitely lost in loss record 70 of 73
==12319==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12319==    by 0x52FC750: safe_malloc (czmq_prelude.h:536)
==12319==    by 0x52FC750: zhash_new (zhash.c:75)
==12319==    by 0x4E5DB76: flux_future_aux_set (future.c:587)
==12319==    by 0x4E5982E: flux_rpc_msg (rpc.c:221)
==12319==    by 0x4E59CB8: flux_rpc_vpack (rpc.c:309)
==12319==    by 0x4E59CB8: flux_rpc_pack (rpc.c:322)
==12319==    by 0x8D02711: spawn_exec_handler (job.c:707)
==12319==    by 0x8D02711: runevent_continuation (job.c:762)
==12319==    by 0x4E7B334: ev_invoke_pending (ev.c:3314)
==12319==    by 0x4E7E4F6: ev_run (ev.c:3717)
==12319==    by 0x4E523E2: flux_reactor_run (reactor.c:132)
==12319==    by 0x8D02BB6: mod_main (job.c:943)
==12319==    by 0x40A9C8: module_thread (module.c:158)
==12319==    by 0x59D6183: start_thread (pthread_create.c:312)
==12319==    by 0x5EEE03C: clone (clone.S:111)
==12319== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:calloc
   fun:safe_malloc
   fun:zhash_new
   fun:flux_future_aux_set
   fun:flux_rpc_msg
   fun:flux_rpc_vpack
   fun:flux_rpc_pack
   fun:spawn_exec_handler
   fun:runevent_continuation
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==12319== 10,967 bytes in 1 blocks are definitely lost in loss record 71 of 73
==12319==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12319==    by 0x5583A10: zmq::msg_t::init_size(unsigned long) (msg.cpp:77)
==12319==    by 0x52DA75C: zframe_new (zframe.c:52)
==12319==    by 0x52E4A85: zmsg_addmem (zmsg.c:293)
==12319==    by 0x4E57D61: flux_msg_copy (message.c:1324)
==12319==    by 0x8D01896: job_create_cb (job.c:423)
==12319==    by 0x4E531D3: call_handler (msg_handler.c:302)
==12319==    by 0x4E5334E: dispatch_message (msg_handler.c:326)
==12319==    by 0x4E5334E: handle_cb (msg_handler.c:392)
==12319==    by 0x4E7B334: ev_invoke_pending (ev.c:3314)
==12319==    by 0x4E7E4F6: ev_run (ev.c:3717)
==12319==    by 0x4E523E2: flux_reactor_run (reactor.c:132)
==12319==    by 0x8D02BB6: mod_main (job.c:943)
==12319==    by 0x40A9C8: module_thread (module.c:158)
==12319==    by 0x59D6183: start_thread (pthread_create.c:312)
==12319==    by 0x5EEE03C: clone (clone.S:111)
==12319== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:_ZN3zmq5msg_t9init_sizeEm
   fun:zframe_new
   fun:zmsg_addmem
   fun:flux_msg_copy
   fun:job_create_cb
   fun:call_handler
   fun:dispatch_message
   fun:handle_cb
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==12319== LEAK SUMMARY:
==12319==    definitely lost: 11,055 bytes in 2 blocks
==12319==    indirectly lost: 8,524 bytes in 43 blocks
==12319==      possibly lost: 0 bytes in 0 blocks
==12319==    still reachable: 110,848 bytes in 108 blocks
==12319==         suppressed: 520 bytes in 2 blocks
==12319== Reachable blocks (those to which a pointer was found) are not shown.
==12319== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12319== 
==12319== For counts of detected and suppressed errors, rerun with: -v
==12319== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
not ok 1 - valgrind reports no new errors on single broker run

@grondo
Copy link
Contributor

grondo commented Sep 4, 2018

Hit in #1642 as well. My intuition is that we're racing with the job module on shutdown in the valgrind test, but I haven't started looking into it yet. It is getting annoyingly frequent though...

@chu11
Copy link
Member Author

chu11 commented Sep 12, 2018

another one, with a slightly different trace found in #1657

travis_time:end:0f8d4f08:start=1536769837445899510,finish=1536769837482477564,duration=36578054
�[0Ktravis_fold:end:after_failure.1
�[0Ktravis_fold:start:after_failure.2
�[0Ktravis_time:start:0a320124
�[0K$ find . -name t[0-9]*.output | xargs -i sh -c 'printf "\033[31mFound {}\033[39m\n";cat {}'
�[31mFound ./t/t5000-valgrind.output�[39m
expecting success: 
	libtool e flux ${VALGRIND} \
		--tool=memcheck \
		--leak-check=full \
		--gen-suppressions=all \
		--trace-children=no \
		--child-silent-after-fork=yes \
		--num-callers=30 \
		--leak-resolution=med \
		--error-exitcode=1 \
		--suppressions=$VALGRIND_SUPPRESSIONS \
		${BROKER} --shutdown-grace=4 ${VALGRIND_WORKLOAD} 10

==14454== Memcheck, a memory error detector
==14454== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==14454== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==14454== Command: /home/travis/build/flux-framework/flux-core/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/t/valgrind/valgrind-workload.sh 10
==14454== 
FLUX_URI=local:///tmp/flux-EOZaat
==14454== 
==14454== HEAP SUMMARY:
==14454==     in use at exit: 6,432,510 bytes in 210 blocks
==14454==   total heap usage: 943,429 allocs, 943,219 frees, 225,326,076 bytes allocated
==14454== 
==14454== 11,659 bytes in 1 blocks are definitely lost in loss record 118 of 127
==14454==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14454==    by 0x558CA10: zmq::msg_t::init_size(unsigned long) (msg.cpp:77)
==14454==    by 0x52E375C: zframe_new (zframe.c:52)
==14454==    by 0x52EDA85: zmsg_addmem (zmsg.c:293)
==14454==    by 0x4E59041: flux_msg_copy (message.c:1324)
==14454==    by 0x8D0AC26: job_create_cb (job.c:423)
==14454==    by 0x4E544B3: call_handler (msg_handler.c:302)
==14454==    by 0x4E5462E: dispatch_message (msg_handler.c:326)
==14454==    by 0x4E5462E: handle_cb (msg_handler.c:392)
==14454==    by 0x4E82474: ev_invoke_pending (ev.c:3314)
==14454==    by 0x4E85636: ev_run (ev.c:3717)
==14454==    by 0x4E536C2: flux_reactor_run (reactor.c:132)
==14454==    by 0x8D0BD86: mod_main (job.c:938)
==14454==    by 0x40A538: module_thread (module.c:158)
==14454==    by 0x59DF183: start_thread (pthread_create.c:312)
==14454==    by 0x5EF703C: clone (clone.S:111)
==14454== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:_ZN3zmq5msg_t9init_sizeEm
   fun:zframe_new
   fun:zmsg_addmem
   fun:flux_msg_copy
   fun:job_create_cb
   fun:call_handler
   fun:dispatch_message
   fun:handle_cb
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==14454== 6,309,491 (224 direct, 6,309,267 indirect) bytes in 1 blocks are definitely lost in loss record 127 of 127
==14454==    at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14454==    by 0x4E82619: ev_realloc (ev.c:1741)
==14454==    by 0x4E82B5A: ev_feed_event (ev.c:2005)
==14454==    by 0x4E85CD0: queue_events (ev.c:2034)
==14454==    by 0x4E85CD0: ev_run (ev.c:3714)
==14454==    by 0x4E536C2: flux_reactor_run (reactor.c:132)
==14454==    by 0x8D0BD86: mod_main (job.c:938)
==14454==    by 0x40A538: module_thread (module.c:158)
==14454==    by 0x59DF183: start_thread (pthread_create.c:312)
==14454==    by 0x5EF703C: clone (clone.S:111)
==14454== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:realloc
   fun:ev_realloc
   fun:ev_feed_event
   fun:queue_events
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==14454== LEAK SUMMARY:
==14454==    definitely lost: 11,883 bytes in 2 blocks
==14454==    indirectly lost: 6,309,259 bytes in 98 blocks
==14454==      possibly lost: 0 bytes in 0 blocks
==14454==    still reachable: 110,848 bytes in 108 blocks
==14454==         suppressed: 520 bytes in 2 blocks
==14454== Reachable blocks (those to which a pointer was found) are not shown.
==14454== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==14454== 
==14454== For counts of detected and suppressed errors, rerun with: -v
==14454== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
not ok 1 - valgrind reports no new errors on single broker run

# failed 1 among 1 test(s)
1..1

@garlick
Copy link
Member

garlick commented Aug 17, 2022

Not updated in a while so presumably no longer an issue.

@garlick garlick closed this as completed Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants