Skip to content

Commit

Permalink
testsuite: add issue reproducer for flux-framework#5518
Browse files Browse the repository at this point in the history
Problem: In issue flux-framework#5518 the job validator worker process becomes hung
because a write failure terminates the process without cleaning up
pending futures.

Add a test that reproduces the error and ensure that failed job
submissions do not hang and that future submissions work after the
errors have cleared.
  • Loading branch information
grondo committed Nov 9, 2023
1 parent 0ff37c3 commit 3931ee3
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 0 deletions.
1 change: 1 addition & 0 deletions t/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ dist_check_SCRIPTS = \
issues/t5105-signal-propagation.sh \
issues/t5308-kvsdir-initial-path.py \
issues/t5368-kvs-commit-clear.py \
issues/t5518-job-validator-hang.sh \
python/__init__.py \
python/subflux.py \
python/tap \
Expand Down
30 changes: 30 additions & 0 deletions t/issues/t5518-job-validator-hang.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/sh
#
# Run an instance with a very small job-ingest working buffer size
# and ensure the worker does not hang after errors are returned
#
export FLUX_URI_RESOLVE_LOCAL=t

# Check if we need to start parent job, if so, reexec under flux-start
if test "$VALIDATOR_HANG_TEST_ACTIVE" != "t"; then
export VALIDATOR_HANG_TEST_ACTIVE=t
printf "Re-launching test script under flux-start\n"
exec flux start -s1 $0
fi

id=$(flux alloc -n1 --bg --conf=ingest.buffer-size=8k)
printf "Launched single core alloc job $id\n"

# Submission of more than 1 job should have some failures, but should not
# hang:
flux proxy $id flux submit --cc=1-10 --watch hostname
rc=$?
printf "submission of multiple jobs got rc=$rc\n"
test $rc -ne 0 || exit 1

# Small job to clear errors
flux proxy $id flux run --env=-* --env=PATH hostname

# Another small job should succeed:
flux proxy $id flux run --env=-* --env=PATH hostname || exit 1
printf "submission of single job still works\n"

0 comments on commit 3931ee3

Please sign in to comment.