-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create program execution service (wreck replacement) #333
Comments
This new tool seems similar to flux-exec. For posterity, would you mind contrasting this new tool against flux-exec? Would flux-exec eventually go away? What would this tool provide that we don't already (or will never) have with flux-exec? |
flux-exec uses the point-to-point cmb.exec protocol. This Issue describes a distributed service built on top of the low level cmb.exec service. flux-exec doesn't have any real support for launching parallel work, tracking distributed processes in groups, signalling these processes as groups, etc. |
I guess you could say this is analogous to the wrexec service, except implemented atop cmb.exec. The three related issues that were entered here were the result of brainstorming a wreck redesign yesterday. The ideas (and descriptions) are a little rough still. |
I feel like we're getting close to a nice abstraction for job/program delineation here. Should we (perhaps in another issue or RFC) define something like a job submission/program specification creates program description which is saved with a uuid in the provenance database. Once a program has started it is associated with an In this model the program specification is like an executable and the fpid is the task_struct data (extended because data for fpids is saved for provenance). At times new programs could be relaunched from the same program specification? (maybe this is getting a bit too pie in the sky, |
I think @grondo and I were focused mainly on incremental improvement of the existing static execution here, but if we can support elasticity at some level without sacrificing our near term milestones then maybe! It would be good at least for me to develop more of an understanding of what that should look like so our design an accommodate it. At our last meeting @surajpkn briefly touched on emerging "standard" runtime interfaces for this, e.g. via PMI-like capability. It would be good to get an issue open referencing any descriptions of these interfaces so we can be thinking about how to provide them to programs? |
Great. I will turn to @surajpkn to providing some information. His dynamic scheduling runtime interface to Torque is also something to look at. @surajpkn, @SteVwonder and I had a nice discussion today, and we now have a reasonably good idea about what aspects of dynamic scheduling under hierarachical scheduling we will work on. I've asked him to give us a short presentation about this next Monday so that we can give him support when he needs them. My guess is his work can be mostly done through adding support to flux-sched under emulation, but later we will want to hook this to the actual flux-core runtime in accordance to RFC 8. So It's better to start to discuss this early. |
The type of adaptive application and the scheduling method we want to use will have some impact on the job submission API. For example: if a job is predictably evolving - meaning that it knows when it needs extra nodes and when it will release some (like a workflow application) - then we might want to include that information in the job submission. Similarly, malleable jobs may require a (min, max) or moldable with ((nodecount1,walltime1),(nodecount2,walltime2)...). It would be nice if there is some "room" with some forethought in the job submission API for easy addition of such information later, so that it doesn't delay the current milestones. Because adding elasticity-related parameters on job submission API that was totally designed for static purpose only could turn out to be very ugly (and TORQUE is a perfect example. I can show that on Monday.) The standard API is trying to address the runtime API for expand/shrink request and orders and will kick off only in November at the BoF. So actually some initial thoughts on FLUX-local API for job submission and elasticity can influence the standard API. We need more clarity on our requirements (from motivating applications) and how we will do adaptive scheduling (application level). In the meantime, I will try to get hold of the PMIX so that we can have a look. |
FYI -- @lipari recently had to deal with PMIX as part of his CORAL involvement as well. |
Notes for things I'm finding would be really useful features in our distributed execution system:
|
Are you talking about the equivalent of job prolog/epilog? I'm actually not following about how this could impact per-task setup, so need more detail sorry. For "teardown code" shouldn't that be implemented more flexibly as dependent program(s)?
I'd like to avoid wrappers as much as possible. Otherwise you have wrappers calling wrappers and the issues become intractable. As we've said, we plan on doing something like spank plugins.
|
Actually hooks already exist in the Right now the hooks take the form of a set of lua plugins that can be dropped into wreck/lua.d/ directory. The hooks are
See |
Closing outdated "wreck replacement" issues. |
Implement a new service that executes distributed "programs" in terms of cmb.exec. It depends on the KVS and creates bare bones KVS entries for programs (historical and current).
Tools such as 'ps' and 'top' would show currently executing programs.
Possibly flux-exec could be modified to (optionally) talk to this service and launch stuff in parallel?
The "program context" would be an opaque area in the program KVS space that would be left up to the "shell" (see #334 ) or the application to fill in.
This service can be (mostly) resource agnostic though it would need to know which broker ranks are to execute tasks. For the most part the mapping of resources to tasks could be a function of the "shell".
This service could assign a monotonic Flux program id (fpid?) to programs as they are executed. Since this number is only unique within the local Flux instance, job submissions should be assigned a uuid so they can be uniquely referenced in system wide logs or a global provenance database. Then each time the submission is executed or re-executed, it is associated with the fpid.
Programs would be submitted using the program specification introduced in RFC 8.
The text was updated successfully, but these errors were encountered: