Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind test fails on rzansel (CORAL testbed) #3093

Open
dongahn opened this issue Jul 29, 2020 · 6 comments
Open

Valgrind test fails on rzansel (CORAL testbed) #3093

dongahn opened this issue Jul 29, 2020 · 6 comments

Comments

@dongahn
Copy link
Member

dongahn commented Jul 29, 2020

The error is at the system software (epoll_ctl) on this platform and we probably need to add this to our suppression list.

rzansel61{dahn}61: ./t5000-valgrind.t -d -v
sharness: loading extensions from /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-core-0.18.0/t/sharness.d/01-setup.sh
sharness: loading extensions from /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-core-0.18.0/t/sharness.d/flux-sharness.sh
expecting success:
	run_timeout 300 \
	flux start -s ${VALGRIND_NBROKERS} \
		--killer-timeout=120 \
		--wrap=libtool,e,${VALGRIND} \
		--wrap=--tool=memcheck \
		--wrap=--leak-check=full \
		--wrap=--gen-suppressions=all \
		--wrap=--trace-children=no \
		--wrap=--child-silent-after-fork=yes \
		--wrap=--num-callers=30 \
		--wrap=--leak-resolution=med \
		--wrap=--error-exitcode=1 \
		--wrap=--suppressions=$VALGRIND_SUPPRESSIONS \
		 ${VALGRIND_WORKLOAD}

==35642== Memcheck, a memory error detector
==35642== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==35642== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==35642== Command: /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-core-0.18.0/src/broker/.libs/lt-flux-broker --setattr=rundir=/var/tmp/flux-35614-x1mvBr --setattr=tbon.endpoint=ipc://%B/req /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-core-0.18.0/t/valgrind/valgrind-workload.sh
==35642==
==35643== Memcheck, a memory error detector
==35643== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==35643== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==35643== Command: /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-core-0.18.0/src/broker/.libs/lt-flux-broker --setattr=rundir=/var/tmp/flux-35614-x1mvBr --setattr=tbon.endpoint=ipc://%B/req
==35643==
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x100098B7: main (broker.c:594)
==35643==  Address 0x1fff00bb14 is on thread 1's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:main
}
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x100098B7: main (broker.c:594)
==35642==  Address 0x1fff00ba84 is on thread 1's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:main
}
==35643== Thread 5:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x40EDECB: flux_future_wait_for (future.c:488)
==35643==    by 0x40EE09F: flux_future_get (future.c:515)
==35643==    by 0x78C26FF: op_event_subscribe (shmem.c:106)
==35643==    by 0x40DE283: flux_event_subscribe (handle.c:778)
==35643==    by 0x10010EC3: register_event (modservice.c:204)
==35643==    by 0x10010EC3: modservice_register (modservice.c:277)
==35643==    by 0x1000E06B: module_thread (module.c:190)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0x78adfd4 is on thread 5's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:flux_future_wait_for
   fun:flux_future_get
   fun:op_event_subscribe
   fun:flux_event_subscribe
   fun:register_event
   fun:modservice_register
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x7042DFF: mod_main (local.c:327)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0x78adfd4 is on thread 5's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 5:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x40EDECB: flux_future_wait_for (future.c:488)
==35642==    by 0x40EE09F: flux_future_get (future.c:515)
==35642==    by 0x78C26FF: op_event_subscribe (shmem.c:106)
==35642==    by 0x40DE283: flux_event_subscribe (handle.c:778)
==35642==    by 0x10010EC3: register_event (modservice.c:204)
==35642==    by 0x10010EC3: modservice_register (modservice.c:277)
==35642==    by 0x1000E06B: module_thread (module.c:190)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0x78adfd4 is on thread 5's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:flux_future_wait_for
   fun:flux_future_get
   fun:op_event_subscribe
   fun:flux_event_subscribe
   fun:register_event
   fun:modservice_register
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x7042DFF: mod_main (local.c:327)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0x78adfd4 is on thread 5's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Thread 6:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x6C14337: mod_main (aggregator.c:683)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0x852e0f4 is on thread 6's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Thread 7:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x6A2358B: mod_main (barrier.c:499)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0x995e0f4 is on thread 7's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 6:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x7D3428B: mod_main (content-sqlite.c:632)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0x85de104 is on thread 6's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 7:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6C14337: mod_main (aggregator.c:683)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0x91ee0f4 is on thread 7's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 8:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6A2358B: mod_main (barrier.c:499)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xa61e0f4 is on thread 8's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 9:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6D0ABFB: mod_main (kvs.c:3018)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0x99fe064 is on thread 9's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 10:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6C85827: mod_main (kvs-watch.c:1165)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xb64e114 is on thread 10's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Thread 8:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x6C85827: mod_main (kvs-watch.c:1165)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0xa98e114 is on thread 8's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Thread 9:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x6D0ABFB: mod_main (kvs.c:3018)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0xb19e064 is on thread 9's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 11:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6E13B77: mod_main (resource.c:299)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xbe5dfc4 is on thread 11's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 12:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6A9420B: mod_main (job-manager.c:136)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xc66e094 is on thread 12's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 14:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0xCE97863: mod_main (cron.c:906)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xd6fe0c4 is on thread 14's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 13:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6B9433F: mod_main (job-info.c:249)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xce7e114 is on thread 13's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 15:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6D954C3: mod_main (job-ingest.c:824)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xe30e084 is on thread 15's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 17:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0xEB36797: mod_main (job-exec.c:1174)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xf3adfd4 is on thread 17's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35643== Thread 10:
==35643== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35643==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35643==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35643==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35643==    by 0x6D954C3: mod_main (job-ingest.c:824)
==35643==    by 0x1000E11F: module_thread (module.c:214)
==35643==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35643==    by 0x46D7E13: clone (clone.S:104)
==35643==  Address 0xb9ae084 is on thread 10's stack
==35643==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35643==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
==35642== Thread 16:
==35642== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==35642==    at 0x46D88E8: epoll_ctl (syscall-template.S:82)
==35642==    by 0x412602B: epoll_modify (ev_epoll.c:96)
==35642==    by 0x40DFE27: flux_reactor_run (reactor.c:126)
==35642==    by 0x6B14AC3: mod_main (sched.c:652)
==35642==    by 0x1000E11F: module_thread (module.c:214)
==35642==    by 0x43B8CD3: start_thread (pthread_create.c:309)
==35642==    by 0x46D7E13: clone (clone.S:104)
==35642==  Address 0xeb1e0a4 is on thread 16's stack
==35642==  in frame #0, created by epoll_ctl (syscall-template.S:81)
==35642==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   epoll_ctl(event)
   fun:epoll_ctl
   fun:epoll_modify
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}
FLUX_URI=local:///var/tmp/flux-35614-x1mvBr/0/local
Running job
38604374016 submitted
51120177152 submitted
68887248896 submitted
97223966720 submitted
121399934976 submitted
144602824704 submitted
175355461632 submitted
208557572096 submitted
239729639424 submitted
275850985472 submitted
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
38604374016 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
51120177152 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
68887248896 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
97223966720 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
121399934976 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
144602824704 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
175355461632 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
208557572096 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
239729639424 complete
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno
275850985472 complete
==35643==
==35643== HEAP SUMMARY:
==35643==     in use at exit: 23,949 bytes in 60 blocks
==35643==   total heap usage: 93,577 allocs, 93,517 frees, 170,711,846 bytes allocated
==35643==
==35643== LEAK SUMMARY:
==35643==    definitely lost: 0 bytes in 0 blocks
==35643==    indirectly lost: 0 bytes in 0 blocks
==35643==      possibly lost: 0 bytes in 0 blocks
==35643==    still reachable: 23,949 bytes in 60 blocks
==35643==         suppressed: 0 bytes in 0 blocks
==35643== Reachable blocks (those to which a pointer was found) are not shown.
==35643== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==35643==
==35643== Use --track-origins=yes to see where uninitialised values come from
==35643== For lists of detected and suppressed errors, rerun with: -s
==35643== ERROR SUMMARY: 666 errors from 8 contexts (suppressed: 12 from 1)
flux-start: 1 (pid 35643) exited with rc=1
==35642==
==35642== HEAP SUMMARY:
==35642==     in use at exit: 30,478 bytes in 77 blocks
==35642==   total heap usage: 957,469 allocs, 957,392 frees, 284,631,831 bytes allocated
==35642==
==35642== LEAK SUMMARY:
==35642==    definitely lost: 0 bytes in 0 blocks
==35642==    indirectly lost: 0 bytes in 0 blocks
==35642==      possibly lost: 0 bytes in 0 blocks
==35642==    still reachable: 30,478 bytes in 77 blocks
==35642==         suppressed: 0 bytes in 0 blocks
==35642== Reachable blocks (those to which a pointer was found) are not shown.
==35642== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==35642==
==35642== Use --track-origins=yes to see where uninitialised values come from
==35642== For lists of detected and suppressed errors, rerun with: -s
==35642== ERROR SUMMARY: 2736 errors from 15 contexts (suppressed: 28 from 2)
flux-start: 0 (pid 35642) exited with rc=1
not ok 1 - valgrind reports no new errors on 2 broker run
#
#		run_timeout 300 \
#		flux start -s ${VALGRIND_NBROKERS} \
#			--killer-timeout=120 \
#			--wrap=libtool,e,${VALGRIND} \
#			--wrap=--tool=memcheck \
#			--wrap=--leak-check=full \
#			--wrap=--gen-suppressions=all \
#			--wrap=--trace-children=no \
#			--wrap=--child-silent-after-fork=yes \
#			--wrap=--num-callers=30 \
#			--wrap=--leak-resolution=med \
#			--wrap=--error-exitcode=1 \
#			--wrap=--suppressions=$VALGRIND_SUPPRESSIONS \
#			 ${VALGRIND_WORKLOAD}
#

# failed 1 among 1 test(s)
1..1
@dongahn
Copy link
Member Author

dongahn commented Jul 30, 2020

Also from flux-sched's valgrind tests, I got:

2020-07-30T01:13:52.174507Z sched-simple.err[0]: service_unregister: Invalid argument
flux-broker: src/zlistx.c:220: zlistx_first: Assertion `self' failed.
==29257==
==29257== Process terminating with default action of signal 6 (SIGABRT): dumping core
==29257==    at 0x45AFCB0: raise (raise.c:55)
==29257==    by 0x45B200B: abort (abort.c:90)
==29257==    by 0x45A58C3: __assert_fail (assert.c:101)
==29257==    by 0x41E1FD3: zlistx_first (in /usr/lib64/libczmq.so.3.0.0)
==29257==    by 0xFAE3913: simple_sched_destroy (sched.c:112)
==29257==    by 0xFAE3913: mod_main (sched.c:488)
==29257==    by 0x1000C953: module_thread (module.c:203)
==29257==    by 0x4378CD3: start_thread (pthread_create.c:309)
==29257==    by 0x4697E13: clone (clone.S:104)


==29257== 367,360 bytes in 287 blocks are possibly lost in loss record 1,407 of 1,409
==29257==    at 0x40842DC: malloc (vg_replace_malloc.c:309)
==29257==    by 0x580B7ACF: ??? (in /usr/libexec/valgrind/memcheck-ppc64le-linux)
==29257==    by 0x92C16CB: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x92CE64B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x92D2ECF: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x92D319F: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x92CA1C3: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x9306E53: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x93071B7: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x930B4EB: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x9311237: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x933B39B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x933FEE3: sqlite3_step (in /usr/lib64/libsqlite3.so.0.8.6)
==29257==    by 0x9243137: store_cb (content-sqlite.c:362)
==29257==    by 0x40E1487: call_handler (msg_handler.c:231)
==29257==    by 0x40E1F63: dispatch_message (msg_handler.c:267)
==29257==    by 0x40E1F63: handle_cb (msg_handler.c:367)
==29257==    by 0x40DE1FB: handle_cb (reactor.c:299)
==29257==    by 0x40EB94F: check_cb (ev_flux.c:64)
==29257==    by 0x4125B1B: ev_invoke_pending (ev.c:3372)
==29257==    by 0x412AC83: ev_run (ev.c:3775)
==29257==    by 0x40DF8F7: flux_reactor_run (reactor.c:126)
==29257==    by 0x9243413: mod_main (content-sqlite.c:440)
==29257==    by 0x1000C953: module_thread (module.c:203)
==29257==    by 0x4378CD3: start_thread (pthread_create.c:309)
==29257==    by 0x4697E13: clone (clone.S:104)
==29257==
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:malloc
   obj:/usr/libexec/valgrind/memcheck-ppc64le-linux
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   obj:/usr/lib64/libsqlite3.so.0.8.6
   fun:sqlite3_step
   fun:store_cb
   fun:call_handler
   fun:dispatch_message
   fun:handle_cb
   fun:handle_cb
   fun:check_cb
   fun:ev_invoke_pending
   fun:ev_run
   fun:flux_reactor_run
   fun:mod_main
   fun:module_thread
   fun:start_thread
   fun:clone
}

I don't know if simple_sched_destroy crash is a real problem or not. If someone wants to look at it,

cd /collab/usr/global/tools/flux/blueos_3_ppc64le_ib/build/2020-07-29-c0.18.0-s0.10.0/flux-sched-0.10.0/t
./t5000-valgrind.t -d -v

The leak looks like comes from sqlite lib... so we may want to suppress this...

@grondo
Copy link
Contributor

grondo commented Jul 30, 2020

The leak probably occurs because the process aborted and didn't go down the normal shutdown path.

However, it does look like this is a bug simple_sched_destroy() doesn't check for validity of ss->queue pointer before calling zlistx_destroy. (this will be even more important when we have asynchronous initialization)

@grondo
Copy link
Contributor

grondo commented Jul 30, 2020

Though I wonder what version of flux-core you are using -- simple_sched_destroy is on line 113 not 112 in my version..

@dongahn
Copy link
Member Author

dongahn commented Jul 30, 2020

Ah... maybe directly invoking a test for flux-sched (./t5000-valgrind.t -d -v ) isn't supported.

@dongahn
Copy link
Member Author

dongahn commented Jul 30, 2020

I can try make TESTS=./t5000-valgrind.t check. How do you get the verbosity info in this case?

@grondo
Copy link
Contributor

grondo commented Jul 30, 2020

sometimes it works to run the flux-sched tests under flux, e.g.

/path/to/correct/flux ./t5000-valigrind.t -d -v

If that doesn't work, then try

$ verbose=t FLUX_TESTS_LOGFILE=t make TESTS=t5000-valgrind.t check

Then check for *.output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants