Skip to content

Commit

Permalink
testsuite: add issue reproducer for flux-framework#5518
Browse files Browse the repository at this point in the history
Problem: In issue flux-framework#5518 the job validator worker process becomes hung
because a write failure terminates the process without cleaning up
pending futures.

Add a test that reproduces the error and ensure that failed job
submissions do not hang and that future submissions work after the
errors have cleared.
  • Loading branch information
grondo committed Nov 9, 2023
1 parent 0ff37c3 commit aee4586
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions t/issues/t5518-job-validator-hang.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/sh
#
# Run an instance with a very small job-ingest working buffer size
# and ensure the worker does not hang after errors are returned
#
id=$(flux alloc -n1 --bg --conf=ingest.buffer-size=4k)
printf "Launched single core alloc job $id\n"

# Submission of more than 1 job should have some failures, but should not
# hang:
flux proxy $id flux submit --cc=1-5 --watch hostname
rc=$?
printf "submission of multiple jobs got rc=$rc\n"
test $rc -ne 0 || exit 1

# Another small job should succeed:
flux proxy $id flux run --env=-* --env=PATH hostname || exit 1
printf "submission of single job still works\n"

0 comments on commit aee4586

Please sign in to comment.