-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the new wreck.state.submitted event #295
Conversation
@SteVwonder: this has a change to your emulator code. Please review if you can. Thanks. |
sched/sched.c
Outdated
@@ -462,11 +453,11 @@ static int update_state (flux_t *h, uint64_t jid, job_state_t os, job_state_t ns | |||
return rc; | |||
} | |||
|
|||
static inline bool is_newjob (json_t *jcb) | |||
static inline bool is_newjob (flux_t *h, json_t *jcb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the flux_t *h
argument be removed? It doesn't appear to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can. It was there to print some debug msgs.
simulator/submitsrv.c
Outdated
@@ -239,7 +238,7 @@ int schedule_next_job (flux_t *h, sim_state_t *sim_state) | |||
return -1; | |||
} | |||
|
|||
f = flux_rpc_pack (h, "job.create", FLUX_NODEID_ANY, 0, | |||
f = flux_rpc_pack (h, "job.submit", FLUX_NODEID_ANY, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause a race in the KVS for simulator-specific job information. Previously, the simulator submit module worked by creating the job, then populating the KVS directory with simulator specific information (via the put_job_in_kvs
function), and then setting the job to submitted
. At that point, the scheduler processed the job, scheduled it, and the sim_execsrv
module would simulate the execution. In the process of simulating the job execution, sim_execsrv
would pull the simulator specific information out of the kvs. So in the previous version, the simulator specific information was guaranteed to exist in the KVS before the sim_execsrv
module tried to pull it, because the submit
module wouldn't permit the scheduling of the job until the custom simulator information was placed into the KVS.
After a quick glance at flux-framework/flux-core#1389, it seems the new wreck still supports job.create
. Is there a way we can keep that flow? Maybe when the submit
module sends the wreck.state.submitted
event, it can include the new ntasks
, nnodes
, and walltime
data in the json payload?
I'm not married to that control flow though. I'm sure there is a cleaner way to solve this race-condition that I'm not seeing....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick glance at flux-framework/flux-core#1389, it seems the new wreck still supports job.create. Is there a way we can keep that flow?
Sorry we didn't check with you first about these changes @SteVwonder!
Since the scheduler should now ignore the reserved
state, you could still use job.create
from the simulator to "reserve" a job entry in the kvs, fill in values specific to the scheduler, and once those have been committed to the KVS issue the job.sumbit
event yourself as you suggested. I don't think the scheduler could tell the difference between the simulator and the flux-submit
program. (This is exactly the purpose for which the "reserved" state was designed)
That is my $0.02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for apologies. I saw the discussion about the state changes in the github issue, but I forgot this is how the simulator submit
module works. So I'll take the blame for not catching this earlier.
you could still use job.create from the simulator to "reserve" a job entry in the kvs, fill in values specific to the scheduler, and once those have been committed to the KVS issue the job.sumbit event yourself as you suggested.
That works for me. @dongahn, are you ok with the simulator still using job.create
and the reserved
state for now? For the splash target, I think we get this PR in sooner rather than later. In the future, we could refactor to include the simulator specific information in the job.submit
message (when in sim mode) to avoid the race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it won't work, at least without changes to flux-core#1398.
Right now, when the reserved
state is seen, jsc puts the new job into the active job hash. So when submitted
state of this job is seen afterwards, jsc won't be able to insert this job into the active job hash, leading to an error.
The way the scheduler ignores the state change of the job starting withreserved
state is that, it just doesn't recognize null->reserved as the new job event. So the scheduler won't put the job into its job index at all. Thus, any subsequent event of that job will be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to make a minor modification to the JSC code in flux-core#1398 to allow your original tool flow. Stay tuned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dongahn. IMO the "reserved" state should be ignored by the scheduler -- it was originally just a state that marked a kvs directory as reserved for a writer. In the wreck changes I considered not emitting an event for it, but I knew it might break the scheduler. (Of course, we can revisit this once we start work on job-ingest module there might not be a use for a reserved state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grondo: effectively the scheduler is ignoring it. The reason JSC needs to keep track of this event is to allow the scheduler to ignore all of the state transitions that started from reserved
state which I initially thought was just for the wreckrun use case. (If it makes sense). But now the emulator does the same, and I think I know how to accommodate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteVwonder or @grondo: wreck.state.submitted
will have to change something like the following to emulate the new wreck behavior. What kvs_path should I use with the wreck.state.submitted
event emitted from the submit emulator module?
if (!(msg = flux_event_pack ("wreck.state.submitted",
"{ s:I s:s s:i s:i s:i}",
"lwj", new_jobid,
"kvs_path", path,
"nnodes", job->nnodes,
"ntasks", job->ncpus,
"walltime", (int64_t)job->time_limit))
|| flux_send (h, msg, 0) < 0) {
return -1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongahn, I will have to look at the simulator, but it should be the same path returned in the response to the job.create
call(?)
be67009
to
4c6749b
Compare
Codecov Report
@@ Coverage Diff @@
## master #295 +/- ##
==========================================
- Coverage 74.39% 67.74% -6.66%
==========================================
Files 49 46 -3
Lines 9279 8692 -587
==========================================
- Hits 6903 5888 -1015
- Misses 2376 2804 +428
Continue to review full report at Codecov.
|
@SteVwonder: I added this support for both in this PR and flux-framework/flux-core#1389. But somehow the emulator-based tests all failed with "no event remaining message" after scheduling the first job.
I'm not familiar enough w/ the emulator code that I feel I am not effectively debugging this. Could you please take a look at the broker log and see if you can spot anything? Log is here: |
@dongahn, for some reason GitHub also thinks this is out-of-date with master, you may need to rebase and push again |
I restarted the build, though I guess we know it will fail. This does need to be rebased on the current master apparently. |
Add support within the scheduler and emulator for the new wreck.state.submitted event with which job request info such as nnodes and walltime is piggybacked. Reduce KVS accesses to fetch job request information as well as a legacy event: null to null job state transition event. Use job.submit within the emulator because this optimized eventing is implemented only in submit event, not create event. job.create event still emits wreck.state.reserved event.
@dongahn, I would merge this branch but it does still need to be rebased on master for some reason. Also quickly disable the failing tests so future PRs don't appear to fail known failing tests. |
Disable known broken simulator driven tests for now. Renable temporarily by setting FLUX_SCHED_ENABLE_SIM_TESTS
4c6749b
to
0e099cb
Compare
@dongahn, I went ahead and rebased this branch on current master, and disabled all the simulator driven tests. Just a warning that I force-pushed over your branch, so if you have local work be sure to save it on a different branch or at least don't reset against your upstream branch. All simulator tests are disabled for now, to re-enable temporarily set |
@grondo: thank you! Sorry for the late response. Had a dentist appointment. So if Traverse turns green, this should go in? |
Yes. I've tested it against my pending flux-core branch and seems to work well. (Note: that PR adds a new |
@grondo's change looks great. @garlick, @trws or @SteVwonder, could you push the button? It seems not having this in the master is confusing many at this point. |
Agreed! |
This requires the pending flux-core PR #1389.
Add support within the scheduler and emulator for the new
wreck.state.submitted
event with which job request info such as nnodes and walltime is piggybacked.Use the event's payload and other changes within jsc to optimize job submission handling performance -- i.e., reducing KVS accesses and the number of state transitions.
This should improve the job throughput when the submission rate is super high although I didn't measure the improvements.