testsuite: adjust expectations of recovery in rv1_nosched mode #1000

garlick · 2023-01-28T17:31:55Z

This alters testing a bit in preparation for a workaround and ultimately a proper fix for #991.

The t1008-recovery-none.t sharness test is dropped since it really just contains one recovery test that expects things to fail as they have been. Replace that with a new expected-failure test in t1007-recovery-full that is a slightly modified version of the one proposed by @grondo in #991. This test should fail as expected before and after the workaround proposed in flux-framework/flux-core#4894, and when we have a proper fix for #991, it can be flipped to expect success.

After this is merged, the failing sched CI test in flux-framework/flux-core#4894 should start working.

Problem: t1008-recovery-none.t expects the job manager to abort the scheduler if a job fails to re-allocate resources during the hello handshake, but this behavior will change soon. Drop this test. The behavior it is looking for will either be addressed by a true fix to flux-framework#991 or the workaround proposed in flux-framework/flux-core#4894.

Problem: t1007-recovery-full.t cancels each test job then runs the cleanup_active_jobs helper function, which does the same thing. Drop the explicit cleanup from the test.

@grondo

Problem: there is no test coverage for module reload with running jobs and rv1_nosched. Add test proposed by @grondo in flux-framework#991, expecting failure for now. The test fails before and after the work-around proposed in flux-framework/flux-core#4894 because it checks for both: - qmanager reload fails (fails before the work-around) - job resources remain allocated (fails after the work-around) Increase the broker stderr log verbosity so the fatal job exceptions generated by the work-around at LOG_INFO level are visible when the test is run with -v.

grondo

This looks reasonable to me.

codecov · 2024-08-29T01:00:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.1%. Comparing base (b7a6d35) to head (5d00092).
Report is 583 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #1000     +/-   ##
========================================
- Coverage    74.1%   74.1%   -0.1%     
========================================
  Files          78      78             
  Lines        9327    9327             
========================================
- Hits         6915    6912      -3     
- Misses       2412    2415      +3

see 1 file with indirect coverage changes

garlick added 3 commits January 28, 2023 06:55

testsuite: drop unnecessary cleanup

c93fa44

Problem: t1007-recovery-full.t cancels each test job then runs the cleanup_active_jobs helper function, which does the same thing. Drop the explicit cleanup from the test.

garlick mentioned this pull request Jan 28, 2023

work around fluxion inbability to recover running jobs flux-framework/flux-core#4894

Merged

grondo approved these changes Jan 28, 2023

View reviewed changes

garlick added the merge-when-passing mergify.io - merge PR automatically once CI passes label Jan 28, 2023

mergify bot merged commit 0f46a1d into flux-framework:master Jan 28, 2023

garlick deleted the issue#991 branch February 8, 2023 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testsuite: adjust expectations of recovery in rv1_nosched mode #1000

testsuite: adjust expectations of recovery in rv1_nosched mode #1000

garlick commented Jan 28, 2023 •

edited

Loading

grondo left a comment

codecov bot commented Aug 29, 2024

testsuite: adjust expectations of recovery in rv1_nosched mode #1000

testsuite: adjust expectations of recovery in rv1_nosched mode #1000

Conversation

garlick commented Jan 28, 2023 • edited Loading

grondo left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 29, 2024

Codecov Report

garlick commented Jan 28, 2023 •

edited

Loading