Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the new wreck.state.submitted event #295

Merged
merged 3 commits into from
Mar 30, 2018

Conversation

dongahn
Copy link
Member

@dongahn dongahn commented Mar 27, 2018

This requires the pending flux-core PR #1389.

Add support within the scheduler and emulator for the new wreck.state.submitted event with which job request info such as nnodes and walltime is piggybacked.

Use the event's payload and other changes within jsc to optimize job submission handling performance -- i.e., reducing KVS accesses and the number of state transitions.

This should improve the job throughput when the submission rate is super high although I didn't measure the improvements.

@dongahn
Copy link
Member Author

dongahn commented Mar 27, 2018

@SteVwonder: this has a change to your emulator code. Please review if you can. Thanks.

@coveralls
Copy link

coveralls commented Mar 27, 2018

Coverage Status

Coverage decreased (-6.9%) to 68.968% when pulling 0e099cb on dongahn:aug_submit_event into 09ad395 on flux-framework:master.

sched/sched.c Outdated
@@ -462,11 +453,11 @@ static int update_state (flux_t *h, uint64_t jid, job_state_t os, job_state_t ns
return rc;
}

static inline bool is_newjob (json_t *jcb)
static inline bool is_newjob (flux_t *h, json_t *jcb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the flux_t *h argument be removed? It doesn't appear to be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can. It was there to print some debug msgs.

@@ -239,7 +238,7 @@ int schedule_next_job (flux_t *h, sim_state_t *sim_state)
return -1;
}

f = flux_rpc_pack (h, "job.create", FLUX_NODEID_ANY, 0,
f = flux_rpc_pack (h, "job.submit", FLUX_NODEID_ANY, 0,
Copy link
Member

@SteVwonder SteVwonder Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause a race in the KVS for simulator-specific job information. Previously, the simulator submit module worked by creating the job, then populating the KVS directory with simulator specific information (via the put_job_in_kvs function), and then setting the job to submitted. At that point, the scheduler processed the job, scheduled it, and the sim_execsrv module would simulate the execution. In the process of simulating the job execution, sim_execsrv would pull the simulator specific information out of the kvs. So in the previous version, the simulator specific information was guaranteed to exist in the KVS before the sim_execsrv module tried to pull it, because the submit module wouldn't permit the scheduling of the job until the custom simulator information was placed into the KVS.

After a quick glance at flux-framework/flux-core#1389, it seems the new wreck still supports job.create. Is there a way we can keep that flow? Maybe when the submit module sends the wreck.state.submitted event, it can include the new ntasks, nnodes, and walltime data in the json payload?

I'm not married to that control flow though. I'm sure there is a cleaner way to solve this race-condition that I'm not seeing....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick glance at flux-framework/flux-core#1389, it seems the new wreck still supports job.create. Is there a way we can keep that flow?

Sorry we didn't check with you first about these changes @SteVwonder!

Since the scheduler should now ignore the reserved state, you could still use job.create from the simulator to "reserve" a job entry in the kvs, fill in values specific to the scheduler, and once those have been committed to the KVS issue the job.sumbit event yourself as you suggested. I don't think the scheduler could tell the difference between the simulator and the flux-submit program. (This is exactly the purpose for which the "reserved" state was designed)

That is my $0.02

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for apologies. I saw the discussion about the state changes in the github issue, but I forgot this is how the simulator submit module works. So I'll take the blame for not catching this earlier.

you could still use job.create from the simulator to "reserve" a job entry in the kvs, fill in values specific to the scheduler, and once those have been committed to the KVS issue the job.sumbit event yourself as you suggested.

That works for me. @dongahn, are you ok with the simulator still using job.create and the reserved state for now? For the splash target, I think we get this PR in sooner rather than later. In the future, we could refactor to include the simulator specific information in the job.submit message (when in sim mode) to avoid the race condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it won't work, at least without changes to flux-core#1398.

Right now, when the reserved state is seen, jsc puts the new job into the active job hash. So when submitted state of this job is seen afterwards, jsc won't be able to insert this job into the active job hash, leading to an error.

The way the scheduler ignores the state change of the job starting withreserved state is that, it just doesn't recognize null->reserved as the new job event. So the scheduler won't put the job into its job index at all. Thus, any subsequent event of that job will be ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to make a minor modification to the JSC code in flux-core#1398 to allow your original tool flow. Stay tuned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongahn. IMO the "reserved" state should be ignored by the scheduler -- it was originally just a state that marked a kvs directory as reserved for a writer. In the wreck changes I considered not emitting an event for it, but I knew it might break the scheduler. (Of course, we can revisit this once we start work on job-ingest module there might not be a use for a reserved state)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grondo: effectively the scheduler is ignoring it. The reason JSC needs to keep track of this event is to allow the scheduler to ignore all of the state transitions that started from reserved state which I initially thought was just for the wreckrun use case. (If it makes sense). But now the emulator does the same, and I think I know how to accommodate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteVwonder or @grondo: wreck.state.submitted will have to change something like the following to emulate the new wreck behavior. What kvs_path should I use with the wreck.state.submitted event emitted from the submit emulator module?

    if (!(msg = flux_event_pack ("wreck.state.submitted",
                                 "{ s:I s:s s:i s:i s:i}",
                                 "lwj", new_jobid,
                                 "kvs_path", path,
                                 "nnodes", job->nnodes,
                                 "ntasks", job->ncpus,
                                 "walltime", (int64_t)job->time_limit))
        || flux_send (h, msg, 0) < 0) {
        return -1;
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongahn, I will have to look at the simulator, but it should be the same path returned in the response to the job.create call(?)

@codecov-io
Copy link

codecov-io commented Mar 28, 2018

Codecov Report

Merging #295 into master will decrease coverage by 6.65%.
The diff coverage is 55.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #295      +/-   ##
==========================================
- Coverage   74.39%   67.74%   -6.66%     
==========================================
  Files          49       46       -3     
  Lines        9279     8692     -587     
==========================================
- Hits         6903     5888    -1015     
- Misses       2376     2804     +428
Impacted Files Coverage Δ
sched/sched.c 61.69% <55.55%> (-12.39%) ⬇️
simulator/simulator.c 0.84% <0%> (-89.92%) ⬇️
sched/sched_backfill.c 13.37% <0%> (-77.08%) ⬇️
src/common/libutil/log.c 28.88% <0%> (-33.34%) ⬇️
sched/rsreader.c 76.38% <0%> (-20.14%) ⬇️
sched/rs2rank.c 80.17% <0%> (-14.66%) ⬇️
src/common/libutil/shortjansson.h 77.96% <0%> (-10.37%) ⬇️
sched/sched_fcfs.c 88.63% <0%> (-5.69%) ⬇️
sched/flux-waitjob.c 85.12% <0%> (-3.31%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09ad395...0e099cb. Read the comment docs.

@dongahn
Copy link
Member Author

dongahn commented Mar 28, 2018

@SteVwonder: I added this support for both in this PR and flux-framework/flux-core#1389. But somehow the emulator-based tests all failed with "no event remaining message" after scheduling the first job.

2018-03-27T23:44:11.925585Z submit.debug[0]: sent a reply request: {"sim_time": 19643.0, "event_timers": {"sched": -1.0, "sim_exec": -1.0, "submit": -1.0}, "mod_name": "submit"}
2018-03-27T23:44:11.925709Z sim.debug[0]: No events remaining

I'm not familiar enough w/ the emulator code that I feel I am not effectively debugging this. Could you please take a look at the broker log and see if you can spot anything? Log is here:
t2000-fcfs.broker.log.

@grondo
Copy link
Contributor

grondo commented Mar 28, 2018

@dongahn, for some reason GitHub also thinks this is out-of-date with master, you may need to rebase and push again

@grondo
Copy link
Contributor

grondo commented Mar 28, 2018

I restarted the build, though I guess we know it will fail. This does need to be rebased on the current master apparently.

dongahn added 2 commits March 29, 2018 14:50
Add support within the scheduler and emulator for
the new wreck.state.submitted event with which
job request info such as nnodes and walltime is piggybacked.

Reduce KVS accesses to fetch job request information
as well as a legacy event: null to null job state
transition event.

Use job.submit within the emulator because this optimized
eventing is implemented only in submit event, not create
event. job.create event still emits wreck.state.reserved
event.
@grondo
Copy link
Contributor

grondo commented Mar 29, 2018

@dongahn, I would merge this branch but it does still need to be rebased on master for some reason. Also quickly disable the failing tests so future PRs don't appear to fail known failing tests.

Disable known broken simulator driven tests for now.
Renable temporarily by setting FLUX_SCHED_ENABLE_SIM_TESTS
@grondo grondo force-pushed the aug_submit_event branch from 4c6749b to 0e099cb Compare March 29, 2018 23:57
@grondo
Copy link
Contributor

grondo commented Mar 30, 2018

@dongahn, I went ahead and rebased this branch on current master, and disabled all the simulator driven tests. Just a warning that I force-pushed over your branch, so if you have local work be sure to save it on a different branch or at least don't reset against your upstream branch.

All simulator tests are disabled for now, to re-enable temporarily set FLUX_SCHED_ENABLE_SIM_TESTS in the environment

@dongahn
Copy link
Member Author

dongahn commented Mar 30, 2018

@grondo: thank you! Sorry for the late response. Had a dentist appointment.

So if Traverse turns green, this should go in?

@grondo
Copy link
Contributor

grondo commented Mar 30, 2018

Yes. I've tested it against my pending flux-core branch and seems to work well. (Note: that PR adds a new ncores attribute to job submission, if you have a chance you might want to take a look)

@dongahn
Copy link
Member Author

dongahn commented Mar 30, 2018

@grondo's change looks great. @garlick, @trws or @SteVwonder, could you push the button? It seems not having this in the master is confusing many at this point.

@garlick
Copy link
Member

garlick commented Mar 30, 2018

Agreed!

@garlick garlick merged commit 92fc364 into flux-framework:master Mar 30, 2018
@grondo grondo mentioned this pull request May 11, 2018
@dongahn dongahn deleted the aug_submit_event branch July 13, 2019 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants