Skip to content

User Requirements

Ole Weidner edited this page Aug 6, 2013 · 19 revisions

Please answer the following questions as detailed as possible:

  • how saga-pilot will be used in your research?
  • which features of saga-pilot will be most critical for your research?
  • which type of jobs are you planning to run with saga-pilot?
  • what are the performance expectations you have?
  • how critical is support for data handling capabilities?
  • (NEW) if you had to design and implement saga-pilot by yourself, guided only by your specific requirements, how would you do it

Ashley

  • how saga-pilot will be used in your research?

For now, mostly with regards to figuring out scheduling algorithms -- may be application-level for SCIHM/rock physics/etc later on.

  • which features of saga-pilot will be most critical for your research?

Anything relating to the scheduling -- the ability to refine scheduling decisions iteratively with response to external information service + internal state changes would be nice.

  • which type of jobs are you planning to run with saga-pilot?

Mostly interested in theoretical scheduling at this point in time; may end up doing SCIHM/rock physics/etc applications, but for the most part fairly short-running "heterogeneous" jobs of varying length w/ data dependencies to start.

  • what are the performance expectations you have?

Whatever is needed in order to schedule a "reasonable" application workload in terms of CUs -- still coming up with target applications for this

  • how critical is support for data handling capabilities?

Extremely, at least with regards to scheduling data

Mark

  • how saga-pilot will be used in your research?

Primary vehicle to put application workload on infrastructure.

  • which features of saga-pilot will be most critical for your research?

Semantically equivalent to P*. Support for all current CI, although that will come through saga-python hopefully. Support for stderr/stdout on CU level.

  • which type of jobs are you planning to run with saga-pilot?

All kinds of, given that I'm not an application owner, application workloads come and go.

  • what are the performance expectations you have?

In general the pilot-abstraction should not be the bottleneck for the dimension of the infrastructure we work with and the typical application workload we support.

  • how critical is support for data handling capabilities?

Essential.

Melissa

Matteo

how saga-pilot will be used in your research?

Saga-pilot will be used by a workload manager within AIMES and TROY, and as the pilot layer for F*. The workload manager will offer capabilities to define (automatically) a pilot framework. The framework will be tailored to run the tasks of a given workload. The requirements for the pilot framework will be derived mainly by inspecting the characteristics of the tasks. Tasks will be grouped in 'stages' and, in case, will be related temporally and spatially. More information about the workload manager can be found in the AIMES and TROY wikis:

which features of saga-pilot will be most critical for your research?

Clean separation and free composition of the following functionalities:

  • Framework:
    • Describe pilot;
    • describe *unit;
    • bind *unit;
    • instantiate pilot;
    • submit *unit;
    • execute *unit.
  • *Unit control:
    • suspend *unit execution;
    • restart *unit execution.
  • *Unit inspection:
    • retrieve *unit status;
    • retrieve partial *unit output;
    • retrieve *unit output.

Possibly relevant even if, as discussed, we might want to implement a queue system as a 'service' separated from saga-pilot:

  • Describe a queue;
  • bind a queue to a pilot;
  • 'typical' queue operations (add, delete, list, suspend, etc).

Multiple types of interface:

  • REST;
  • python API;
  • command line.

which type of jobs are you planning to run with saga-pilot?

Both synthetic and real-life workloads of the following type:

  • Bag of tasks;
  • replicas;
  • chained/coupled ensemble;
  • workflows.

what are the performance expectations you have?

Too soon to say from a AIMES/TROY/F* point of view? We know already that scalability will be a big deal - 100K tasks?

how critical is support for data handling capabilities?

Not critical at the moment but basic data transfer capabilities.

Antons

Clone this wiki locally