Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: adjust expectations of recovery in rv1_nosched mode #1000

Merged
merged 3 commits into from
Jan 28, 2023

Conversation

garlick
Copy link
Member

@garlick garlick commented Jan 28, 2023

This alters testing a bit in preparation for a workaround and ultimately a proper fix for #991.

The t1008-recovery-none.t sharness test is dropped since it really just contains one recovery test that expects things to fail as they have been. Replace that with a new expected-failure test in t1007-recovery-full that is a slightly modified version of the one proposed by @grondo in #991. This test should fail as expected before and after the workaround proposed in flux-framework/flux-core#4894, and when we have a proper fix for #991, it can be flipped to expect success.

After this is merged, the failing sched CI test in flux-framework/flux-core#4894 should start working.

Problem: t1008-recovery-none.t expects the job manager to abort the
scheduler if a job fails to re-allocate resources during the hello
handshake, but this behavior will change soon.

Drop this test.  The behavior it is looking for will either be
addressed by a true fix to flux-framework#991 or the workaround proposed in
flux-framework/flux-core#4894.
Problem: t1007-recovery-full.t cancels each test job then runs
the cleanup_active_jobs helper function, which does the same thing.

Drop the explicit cleanup from the test.
Problem: there is no test coverage for module reload with running
jobs and rv1_nosched.

Add test proposed by @grondo in flux-framework#991, expecting failure for now.

The test fails before and after the work-around proposed in
flux-framework/flux-core#4894 because it checks for both:
- qmanager reload fails (fails before the work-around)
- job resources remain allocated (fails after the work-around)

Increase the broker stderr log verbosity so the fatal job exceptions
generated by the work-around at LOG_INFO level are visible when
the test is run with -v.
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me.

@garlick garlick added the merge-when-passing mergify.io - merge PR automatically once CI passes label Jan 28, 2023
@mergify mergify bot merged commit 0f46a1d into flux-framework:master Jan 28, 2023
@garlick garlick deleted the issue#991 branch February 8, 2023 15:05
Copy link

codecov bot commented Aug 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.1%. Comparing base (b7a6d35) to head (5d00092).
Report is 583 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1000     +/-   ##
========================================
- Coverage    74.1%   74.1%   -0.1%     
========================================
  Files          78      78             
  Lines        9327    9327             
========================================
- Hits         6915    6912      -3     
- Misses       2412    2415      +3     

see 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants