Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: fix a couple race conditions in t4011-match-duration.t #1109

Merged
merged 3 commits into from
Nov 14, 2023

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Nov 14, 2023

This PR fixes a couple race conditions in t/t4011-match-duration.t which cause the test to fail occasionally.

Problem: It may be useful to log when an instance expiration update
occurs, but the resource module does not do that.

Add a log message on expiration update.
Problem: Sometimes it is useful during testing to wait for a given
pattern to appear in the Flux dmesg logs, but waiting for a matching
line to appear in `flux-dmesg(1)` output using the shell is brittle
and inconvenient.

Copy the dmesg-grep.py script from flux-core for this purpose.
Problem: There are two potential race conditions in the final test in
t4011-match-duration.t which ensures the internal expiration of the
scheduler resource is adjusted after an expiration update.

 1. A check for the default duration of a submitted job assumes the
   duration will be strictly less than the instance duration, but if
   the job starts within the same second as the instance starttime
   then the duration could match exactly the instance duration since
   Fluxion deals in whole seconds.

 2. The scheduler is notified of the expiration update asynchronously
   with respect to the instance update, which is detected in the test
   via the resource-update event in the resource.eventlog. This could
   result in the second submitted job receiving an unexpected default
   duration.

Fix case 1 above by allowing the job duration to match (but not
exceed) the instance duration. Fix case 2 by blocking the test until
the scheduler emits a log message for the duration update.
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #1109 (1600ffb) into master (2208821) will increase coverage by 0.0%.
The diff coverage is 100.0%.

❗ Current head 1600ffb differs from pull request most recent head 68fd13e. Consider uploading reports for the commit 68fd13e to get more accurate results

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1109   +/-   ##
======================================
  Coverage    71.8%   71.8%           
======================================
  Files          88      88           
  Lines       11530   11531    +1     
======================================
+ Hits         8286    8287    +1     
  Misses       3244    3244           
Files Coverage Δ
resource/modules/resource_match.cpp 67.7% <100.0%> (+<0.1%) ⬆️

Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @grondo !

@grondo
Copy link
Contributor Author

grondo commented Nov 14, 2023

Thanks @jameshcorbett! I had to restart the spelling and pr-validator checks because they failed for unknown reason. I'll set MWP now.

@grondo grondo added the merge-when-passing mergify.io - merge PR automatically once CI passes label Nov 14, 2023
@mergify mergify bot merged commit 0b70113 into flux-framework:master Nov 14, 2023
20 checks passed
@grondo grondo deleted the t4011-improvements branch November 14, 2023 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants