Pilots 2.0: generic, configurable pilots

This is a proposal for a refactoring of the "pilots" code. The goal of this proposal is, first of all, to make today's pilots easy to configure, and easy to extend.

We start with a definition of pilot. A pilot is what creates the possibility to run jobs on a worker node. A pilot can be sent, as a script to be run. Or, it can be fetched. A pilot can run on every computing resource, e.g.: CREAM Computing element, DIRAC Computing element, Virtual Machine in the form of contextualization script.

A pilot has, at a minimum, to:

install DIRAC
configure DIRAC
run the JobAgent

A pilot has to run on each and every computing resource type.

Limitations of current solution

The current solution lacks extensibility, and maintainability. Its building block is the script dirac-pilot.py, which uses the script dirac-install.py for the installation. Communities can define their own pilot scripts, but it is currently impossible to:

add capabilities to the current pilot while inheriting from the base one
define different pilots based on the type of the computing resource.

Proposal

We propose a solution where:

The pilot script can be generated at runtime.
The pilot script can be generated server side, for every resource embracing the Grid model, and the IaaS (Infrastructure as a Service) model (e.g. clouds). In DIRAC terminology this is true for every resource for which a "Director" is needed.
The pilot script can be generated client side. This is necessary for every resource embracing the IaaC (Infrastructure as a Client) model.
A toolbox of pilots capabilities (that we will call "commands") is available for generating the pilot script
each command implements a single, atomic, functions, e.g.:
- run tests
- install DIRAC
- configure DIRAC
- run JobAgent
- run monitoring agent
- report usage
- ... and whatever it is needed
VOs can easily extend the content of the toolbox, adding more commands
different computing resource types can run different pilots

The proposed solution requires that each command follows a coding convention, and can be found in a specific part of the code, e.g. "DIRAC.WorkloadManagementSystem.Command.MyCommand":

class MyCommand(object):
    def do():
        """ Here, we specify what the command does
        """

For each computing resource, VOs can specify a pilot type, by simply modifying a description field (here represented as key-value pair in a dictionary), e.g.:

{ 
  'CREAM': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent'],
  'CLOUD': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunMonitoringAgent', 'RunJobAgent', 'UploadPilotOutput'],
  'BOINC': ['InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent', 'UploadPilotOutput'],
}

Server side, or client side, a pilot script is generated, e.g. (nota bene: this code is just a stub):

def generatePilot( commandsList ):

   for command in commandsList: 
        try:
            __import__("%sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command ))
            importLine = "from %sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command )
        except ImportError:
            __import__("DIRAC.WorkloadManagementSystem.Command.%s" % command )
            importLine = "from DIRAC.WorkloadManagementSystem.Command.%s" % command

        addToScript = importLine + "%s().do()" %command

For Grid and IaaS typed of resources, once the pilot script is generated, it is then added, by the TaskQueue/Site/Cloud director to the list of files that compose the pilot tarball.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pilots 2.0: generic, configurable pilots

Limitations of current solution

Proposal

Clone this wiki locally