-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t5000-valgrind test fails on Jetson Nano #855
Comments
That makes sense. Want to submit a PR against flux-sched? Eyeballing it, it seems like the same patch ought to apply (same path, and same patch context AFAICT). |
Let me try again... but pretty sure it failed even after trying to patch with the same patch file as |
Do you have the valgrind outputs from your failure to post? |
Figuring out how to run the |
Should be the same as flux-core: change to the |
The patch just adds a new stanza to he |
$ ./t5000-valgrind.t -d -v
sharness: loading extensions from /home/user/flux-sched/t/sharness.d/flux-sharness.sh
sharness: loading extensions from /home/user/flux-sched/t/sharness.d/sched-sharness.sh
expecting success:
run_timeout 400 \
flux start -s ${VALGRIND_NBROKERS} \
--killer-timeout=120 \
--wrap=libtool,e,${VALGRIND} \
--wrap=--tool=memcheck \
--wrap=--leak-check=full \
--wrap=--gen-suppressions=all \
--wrap=--trace-children=no \
--wrap=--child-silent-after-fork=yes \
--wrap=--num-callers=30 \
--wrap=--leak-resolution=med \
--wrap=--error-exitcode=1 \
--wrap=--suppressions=$VALGRIND_SUPPRESSIONS \
${VALGRIND_WORKLOAD}
==2548309== Memcheck, a memory error detector
==2548309== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2548309== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2548309== Command: /usr/libexec/flux/cmd/flux-broker --setattr=rundir=/tmp/flux-RJUk2d
==2548309==
==2548308== Memcheck, a memory error detector
==2548308== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2548308== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2548308== Command: /usr/libexec/flux/cmd/flux-broker --setattr=rundir=/tmp/flux-RJUk2d /home/user/flux-sched/t/valgrind/valgrind-workload.sh
==2548308==
FLUX_URI=local:///tmp/flux-RJUk2d/local-0
not ok 1 - valgrind reports no new errors on 2 broker run
#
# run_timeout 400 \
# flux start -s ${VALGRIND_NBROKERS} \
# --killer-timeout=120 \
# --wrap=libtool,e,${VALGRIND} \
# --wrap=--tool=memcheck \
# --wrap=--leak-check=full \
# --wrap=--gen-suppressions=all \
# --wrap=--trace-children=no \
# --wrap=--child-silent-after-fork=yes \
# --wrap=--num-callers=30 \
# --wrap=--leak-resolution=med \
# --wrap=--error-exitcode=1 \
# --wrap=--suppressions=$VALGRIND_SUPPRESSIONS \
# ${VALGRIND_WORKLOAD}
#
# failed 1 among 1 test(s)
1..1 Valgrind output, this is after applying the same patch from |
Going to test manually adding those lines to the supp file w/ this issue number and if it works will make a PR. |
I've added the following lines to the end of t/valgrind/valgrind.supp and it still fails in the
|
It is odd that we don't see the stack traces of memory errors. One possibility is this test fails not because of an memory error but because of time out. Could you increase run_timeout in the test like 1200 and see if the test is happier? |
So it passes when I manually set the run timeout to 1200. $ ./t5000-valgrind.t -d -v
sharness: loading extensions from /home/user/flux-sched/t/sharness.d/flux-sharness.sh
sharness: loading extensions from /home/user/flux-sched/t/sharness.d/sched-sharness.sh
expecting success:
run_timeout 1200 \
flux start -s ${VALGRIND_NBROKERS} \
--killer-timeout=120 \
--wrap=libtool,e,${VALGRIND} \
--wrap=--tool=memcheck \
--wrap=--leak-check=full \
--wrap=--gen-suppressions=all \
--wrap=--trace-children=no \
--wrap=--child-silent-after-fork=yes \
--wrap=--num-callers=30 \
--wrap=--leak-resolution=med \
--wrap=--error-exitcode=1 \
--wrap=--suppressions=$VALGRIND_SUPPRESSIONS \
${VALGRIND_WORKLOAD}
==47386== Memcheck, a memory error detector
==47386== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==47386== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==47386== Command: /usr/libexec/flux/cmd/flux-broker --setattr=rundir=/tmp/flux-ZYssQb
==47386==
==47385== Memcheck, a memory error detector
==47385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==47385== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==47385== Command: /usr/libexec/flux/cmd/flux-broker --setattr=rundir=/tmp/flux-ZYssQb /home/user/flux-sched/t/valgrind/valgrind-workload.sh
==47385==
FLUX_URI=local:///tmp/flux-ZYssQb/local-0
Running 00-job
Submitting 10 jobs
f56dfGEfR
f57MNn2aB
f57nCL1D1
f58Rcmsmq
f58tUvrxF
f59a9qeKR
f5AZ6x4Bh
f5BU9EMwD
f5CCSqsKu
f5CsdatLX
Waiting jobs to complete
Completed
2021-08-02T22:47:25.187652Z sched-fluxion-qmanager.err[0]: update_on_resource_response: exiting due to sched-fluxion-resource.notify failure: Operation canceled
==47386==
==47386== HEAP SUMMARY:
==47386== in use at exit: 268,007 bytes in 3,353 blocks
==47386== total heap usage: 107,968 allocs, 104,615 frees, 2,907,408,583 bytes allocated
==47386==
==47386== LEAK SUMMARY:
==47386== definitely lost: 0 bytes in 0 blocks
==47386== indirectly lost: 0 bytes in 0 blocks
==47386== possibly lost: 0 bytes in 0 blocks
==47386== still reachable: 267,831 bytes in 3,351 blocks
==47386== suppressed: 176 bytes in 2 blocks
==47386== Reachable blocks (those to which a pointer was found) are not shown.
==47386== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==47386==
==47386== For lists of detected and suppressed errors, rerun with: -s
==47386== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 524 from 2)
==47385==
==47385== HEAP SUMMARY:
==47385== in use at exit: 275,674 bytes in 3,375 blocks
==47385== total heap usage: 583,004 allocs, 579,629 frees, 136,495,400,551 bytes allocated
==47385==
==47385== LEAK SUMMARY:
==47385== definitely lost: 0 bytes in 0 blocks
==47385== indirectly lost: 0 bytes in 0 blocks
==47385== possibly lost: 0 bytes in 0 blocks
==47385== still reachable: 275,322 bytes in 3,371 blocks
==47385== suppressed: 352 bytes in 4 blocks
==47385== Reachable blocks (those to which a pointer was found) are not shown.
==47385== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==47385==
==47385== For lists of detected and suppressed errors, rerun with: -s
==47385== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1646 from 2)
ok 1 - valgrind reports no new errors on 2 broker run
# passed all 1 test(s)
1..1 |
Is there anything else that needs to be done before closing it? |
No since extending the time out solved it, guess that's it. I'm curious if that'd solve the flux-core issue too. |
Pretty sure this is the same issue as this issue:
flux-framework/flux-core#3808
The patch for that issue I'm guessing needs to be tweaked for
flux-sched
let me know if you want more info from theflux-sched
instance of the test.The text was updated successfully, but these errors were encountered: