-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for arbitrary prerequisites to case.submit #1753
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
6b3e997
Change case.st_archive dependency string to not include strings indic…
mfdeakin-sandia 737d747
Implements the prereq argument for case.submit, allowing the user to …
mfdeakin-sandia 8098551
Revert change from afterok to afterany; this should only be done if t…
mfdeakin-sandia 2bd4fca
Partial implementation of the prereq test. Still need to find out a g…
mfdeakin-sandia 5ef21d2
Implement a much simpler script_regression_test tests
mfdeakin-sandia 41beb58
Make the prereq test more flexible and correct
mfdeakin-sandia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that you should be using the existing system test framework in CIME/SystemTests.
You could create a new test based on the SMS test in which job one is an initial run which writes a restart at the final time and job two is started with the prereq flag and is a CONTINUE_RUN which reads the restart from job 1. This would test that job 2 does not start before job 1 is complete and that job 2 successfully completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat concerned that this would depend on the queue system not fortuitously scheduling the jobs regardless of whether the prerequisites have been met.
The other issue is it's not clear to me how to get job 1's queue id? run_indv calls case_run, which assumes it's already on the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could include a negative test - make the first run fail and assure that the second doesn't start. As for the jobid, code env_batch.py has a subroutine get_job_id. Since you will need to submit two separate jobs for this test you should look at the ERR test for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the silence on this; I've been focusing on other work recently.
I'm not certain what the best way to implement a negative test for this is - querying the batch system to verify the job will never be run seems just as much work as checking that the batch system added the dependency, neither of which seem doable with CIME currently.
I'd rather not add a timeout which might not be met even with the prerequisite running successfully, so querying the batch system seems like something I'd have to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedwards4b @jgfouca Would there be any suggestions on a robust method of testing this without querying the queue? Waiting for some threshold and verifying the dependent one didn't run seems dangerous, especially with the recent changes on skybridge. AFAICT, there isn't a robust method, so just doing the simple positive test seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfdeakin-sandia , can you please explain what the options are in pseudo-code steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests I'm envisioning is as follows:
Submit a job (done)
While that job is running, submit a job with the prerequisite argument (the dependent job) for the currently running job (done)
Ensure the job does not run if the other job has not finished successfully yet. (not done)
I can add a check to the dependent job to verify the first job has finished, but if the dependent job was not submitted to the queue correctly due to a bug in the prerequisite logic, this test may still pass if it makes it onto a node after the original finished.
@jedwards4b suggested a negative test, force the original job to fail, and then verify the dependent job is never run. I could add a timeout and verify the job hasn't run by this time; but this depends on the queue not submitting after the timeout.
Another option would be to write a fake batch system as @billsacks suggested; this would be a separate Python script which outputs the arguments to a file so we can verify they're correct. Given all the xml needed to support this fake batch system, I'm not certain this is the way we want to go, but if it's helpful for other parts of the code that could justify it.