create program execution service (wreck replacement) #333

garlick · 2015-08-18T23:49:33Z

Implement a new service that executes distributed "programs" in terms of cmb.exec. It depends on the KVS and creates bare bones KVS entries for programs (historical and current).

Tools such as 'ps' and 'top' would show currently executing programs.

Possibly flux-exec could be modified to (optionally) talk to this service and launch stuff in parallel?

The "program context" would be an opaque area in the program KVS space that would be left up to the "shell" (see #334 ) or the application to fill in.

This service can be (mostly) resource agnostic though it would need to know which broker ranks are to execute tasks. For the most part the mapping of resources to tasks could be a function of the "shell".

This service could assign a monotonic Flux program id (fpid?) to programs as they are executed. Since this number is only unique within the local Flux instance, job submissions should be assigned a uuid so they can be uniquely referenced in system wide logs or a global provenance database. Then each time the submission is executed or re-executed, it is associated with the fpid.

Programs would be submitted using the program specification introduced in RFC 8.

lipari · 2015-08-19T14:40:55Z

This new tool seems similar to flux-exec. For posterity, would you mind contrasting this new tool against flux-exec? Would flux-exec eventually go away? What would this tool provide that we don't already (or will never) have with flux-exec?

grondo · 2015-08-19T14:46:32Z

flux-exec uses the point-to-point cmb.exec protocol. This Issue describes a distributed service built on top of the low level cmb.exec service. flux-exec doesn't have any real support for launching parallel work, tracking distributed processes in groups, signalling these processes as groups, etc.

garlick · 2015-08-19T14:48:28Z

I guess you could say this is analogous to the wrexec service, except implemented atop cmb.exec.

The three related issues that were entered here were the result of brainstorming a wreck redesign yesterday. The ideas (and descriptions) are a little rough still.

grondo · 2015-08-19T14:59:28Z

This service could assign a monotonic Flux program id (fpid?) to programs as they are executed. Since this number is only unique within the local Flux instance, job _submissions_should be assigned a uuid so they can be uniquely referenced in system wide logs or a global provenance database.

I feel like we're getting close to a nice abstraction for job/program delineation here. Should we (perhaps in another issue or RFC) define something like a job submission/program specification creates program description which is saved with a uuid in the provenance database. Once a program has started it is associated with an fpid which is a container for a running instance of a program, and has associated with it all typical data for running programs (task placement, io, current state, exit statuses if exited, etc.). fpid data would point to program specification that started it, and program specification could have a list of all fpids that it started...

In this model the program specification is like an executable and the fpid is the task_struct data (extended because data for fpids is saved for provenance). At times new programs could be relaunched from the same program specification? (maybe this is getting a bit too pie in the sky,
you'd have to run with the same resource request, working directory, etc.)

dongahn · 2015-08-20T05:19:49Z

@grondo @garlick are you guys going to consider elasticity support as part of this? With Suraj joining the team and his project, I have to guess we can have really good synergy and be able to knock the mechanism/requirement down.

garlick · 2015-08-20T17:18:03Z

I think @grondo and I were focused mainly on incremental improvement of the existing static execution here, but if we can support elasticity at some level without sacrificing our near term milestones then maybe! It would be good at least for me to develop more of an understanding of what that should look like so our design an accommodate it.

At our last meeting @surajpkn briefly touched on emerging "standard" runtime interfaces for this, e.g. via PMI-like capability. It would be good to get an issue open referencing any descriptions of these interfaces so we can be thinking about how to provide them to programs?

dongahn · 2015-08-20T18:10:52Z

Great. I will turn to @surajpkn to providing some information. His dynamic scheduling runtime interface to Torque is also something to look at.

@surajpkn, @SteVwonder and I had a nice discussion today, and we now have a reasonably good idea about what aspects of dynamic scheduling under hierarachical scheduling we will work on. I've asked him to give us a short presentation about this next Monday so that we can give him support when he needs them.

My guess is his work can be mostly done through adding support to flux-sched under emulation, but later we will want to hook this to the actual flux-core runtime in accordance to RFC 8. So It's better to start to discuss this early.

surajpkn · 2015-08-20T21:06:57Z

The type of adaptive application and the scheduling method we want to use will have some impact on the job submission API. For example: if a job is predictably evolving - meaning that it knows when it needs extra nodes and when it will release some (like a workflow application) - then we might want to include that information in the job submission. Similarly, malleable jobs may require a (min, max) or moldable with ((nodecount1,walltime1),(nodecount2,walltime2)...). It would be nice if there is some "room" with some forethought in the job submission API for easy addition of such information later, so that it doesn't delay the current milestones. Because adding elasticity-related parameters on job submission API that was totally designed for static purpose only could turn out to be very ugly (and TORQUE is a perfect example. I can show that on Monday.)

The standard API is trying to address the runtime API for expand/shrink request and orders and will kick off only in November at the BoF. So actually some initial thoughts on FLUX-local API for job submission and elasticity can influence the standard API. We need more clarity on our requirements (from motivating applications) and how we will do adaptive scheduling (application level). In the meantime, I will try to get hold of the PMIX so that we can have a look.

dongahn · 2015-08-20T21:41:52Z

FYI -- @lipari recently had to deal with PMIX as part of his CORAL involvement as well.

trws · 2015-08-23T19:44:05Z

Notes for things I'm finding would be really useful features in our distributed execution system:

pre/post-run commands: This is partly so users can be explicit about their per-task setup and teardown code, context discussion with Dave Richards, semi-automated restart on node failure becomes much easier with this
"wrapper" commands or hooks for containment/binding setup
hooks for state detection, support dead-daemon restart, failover, etc, not directly but as a hook to override the action on task/program exit
IO indirection support: this isn't a solid thing right now, but I have a sneaking suspicion we'll want to be able to make IO handling independent of the KVS at some point, so maybe implement to an abstract "open/append" interface that uses the KVS on the back for the initial implementation?

grondo · 2015-08-24T16:39:29Z

pre/post-run commands: This is partly so users can be explicit about their per-task setup and teardown code, context discussion with Dave Richards, semi-automated restart on node failure becomes much easier with this

Are you talking about the equivalent of job prolog/epilog? I'm actually not following about how this could impact per-task setup, so need more detail sorry. For "teardown code" shouldn't that be implemented more flexibly as dependent program(s)?

"wrapper" commands or hooks for containment/binding setup

I'd like to avoid wrappers as much as possible. Otherwise you have wrappers calling wrappers and the issues become intractable. As we've said, we plan on doing something like spank plugins.

IO indirection support: this isn't a solid thing right now, but I have a sneaking suspicion we'll want to be able to make IO handling independent of the KVS at some point.

cmb.exec doesn't use kvs. I think we planned distributed program framework as described in this issue to use reduction for IO by default, possibly with stdio optionally collected in kvs.

grondo · 2015-08-24T16:45:08Z

Actually hooks already exist in the wrexec prototype if you feel the need to do binding now (can't do real containment until we have support for privileged operations). Actually environment propagation uses these hooks, though I admit they aren't very well designed nor implemented.

Right now the hooks take the form of a set of lua plugins that can be dropped into wreck/lua.d/ directory. The hooks are

rexecd_init -- called as wrexecd starts up
rexecd_task_init -- called per task before exec
rexecd_task_exit -- called as each task exits

See lua.d/01-env.lua for an example.

grondo · 2020-02-25T18:03:34Z

Closing outdated "wreck replacement" issues.

This was referenced Aug 18, 2015

create shell for KVS based exec (wreck replacement) #334

Closed

define interface between program execution service and JSC/scheduler #335

Closed

grondo mentioned this issue Aug 20, 2015

wreck: emit event at job state changes #337

Closed

garlick mentioned this issue Aug 26, 2015

evaulate PMIX for implementation in Flux #365

Closed

garlick mentioned this issue Jun 15, 2016

flux as native resource manager #699

Closed

grondo closed this as completed Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create program execution service (wreck replacement) #333

create program execution service (wreck replacement) #333

garlick commented Aug 18, 2015

lipari commented Aug 19, 2015

grondo commented Aug 19, 2015

garlick commented Aug 19, 2015

grondo commented Aug 19, 2015

dongahn commented Aug 20, 2015

garlick commented Aug 20, 2015

dongahn commented Aug 20, 2015

surajpkn commented Aug 20, 2015

dongahn commented Aug 20, 2015

trws commented Aug 23, 2015

grondo commented Aug 24, 2015

grondo commented Aug 24, 2015

grondo commented Feb 25, 2020

create program execution service (wreck replacement) #333

create program execution service (wreck replacement) #333

Comments

garlick commented Aug 18, 2015

lipari commented Aug 19, 2015

grondo commented Aug 19, 2015

garlick commented Aug 19, 2015

grondo commented Aug 19, 2015

dongahn commented Aug 20, 2015

garlick commented Aug 20, 2015

dongahn commented Aug 20, 2015

surajpkn commented Aug 20, 2015

dongahn commented Aug 20, 2015

trws commented Aug 23, 2015

grondo commented Aug 24, 2015

grondo commented Aug 24, 2015

grondo commented Feb 25, 2020