-
Notifications
You must be signed in to change notification settings - Fork 23
User Requirements
Please answer the following questions as detailed as possible:
- how saga-pilot will be used in your research?
- which features of saga-pilot will be most critical for your research?
- which type of jobs are you planning to run with saga-pilot?
- what are the performance expectations you have?
- how critical is support for data handling capabilities?
- how saga-pilot will be used in your research?
Primary vehicle to put application workload on infrastructure.
- which features of saga-pilot will be most critical for your research?
Semantically equivalent to P*. Support for all current CI, although that will come through saga-python hopefully. Support for stderr/stdout on CU level.
- which type of jobs are you planning to run with saga-pilot?
All kinds of, given that I'm not an application owner, application workloads come and go.
- what are the performance expectations you have?
In general the pilot-abstraction should not be the bottleneck for the dimension of the infrastructure we work with and the typical application workload we support.
- how critical is support for data handling capabilities?
Essential.
Saga-pilot will be used by a workload manager within AIMES and TROY, and as the pilot layer for F*. The workload manager will offer capabilities to define (automatically) a pilot framework. The framework will be tailored to run the tasks of a given workload. The requirements for the pilot framework will be derived mainly by inspecting the characteristics of the tasks. Tasks will be grouped in 'stages' and, in case, will be related temporally and spatially. More information about the workload manager can be found in the AIMES and TROY wikis:
- https://bitbucket.org/shantenujha/aimes/wiki/scenarios
- https://github.com/saga-project/troy/wiki/Design
Clean separation and free composition of the following functionalities:
- Framework:
- Describe pilot;
- describe *unit;
- bind *unit;
- instantiate pilot;
- submit *unit;
- execute *unit.
- *Unit control:
- suspend *unit execution;
- restart *unit execution.
- *Unit inspection:
- retrieve *unit status;
- retrieve partial *unit output;
- retrieve *unit output.
Possibly relevant even if, as discussed, we might want to implement a queue system as a 'service' separated from saga-pilot:
- Describe a queue;
- bind a queue to a pilot;
- 'typical' queue operations (add, delete, list, suspend, etc).
Multiple types of interface:
- REST;
- python API;
- command line.
Both synthetic and real-life workloads of the following type:
- Bag of tasks;
- replicas;
- chained/coupled ensemble;
- workflows.
Too soon to say from a AIMES/TROY/F* point of view? We know already that scalability will be a big deal - 100K tasks?
Not critical at the moment but basic data transfer capabilities.