Skip to content

Pilots 2.0: generic, configurable pilots

fstagni edited this page May 2, 2014 · 7 revisions

This is a proposal for a refactoring of the "pilots" code. The goal of this proposal is, first of all, to make today's pilots easy to configure, and easy to extend.

We start with a definition of pilot. A pilot is what creates the possibility to run jobs on a worker node. A pilot can be sent, as a script to be run. Or, it can be fetched. A pilot can run on every computing resource, e.g.: CREAM Computing element, DIRAC Computing element, Virtual Machine in the form of contextualization script.

A pilot has, at a minimum, to:

  • install DIRAC
  • configure DIRAC
  • run the JobAgent

A pilot has to run on each and every computing resource type.

Limitations of current solution

The current solution lacks extensibility, and maintainability. Its building block is the script dirac-pilot.py, which uses the script dirac-install.py for the installation. Communities can define their own pilot scripts, but it is currently impossible to:

  • add capabilities to the current pilot while inheriting from the base one
  • define different pilots based on the type of the computing resource.

Proposal

We propose a solution where:

  • The pilot script can be generated at runtime.
  • The pilot script can be generated server side, for every resource embracing the Grid model, and the IaaS (Infrastructure as a Service) model (e.g. clouds). In DIRAC terminology this is true for every resource for which a "Director" is needed.
  • The pilot script can be generated client side. This is necessary for every resource embracing the IaaC (Infrastructure as a Client) model.
  • A toolbox of pilots capabilities (that we will call "commands") is available for generating the pilot script
  • each command implements a single, atomic, functions, e.g.:
    • run tests
    • install DIRAC
    • configure DIRAC
    • run JobAgent
    • run monitoring agent
    • report usage
    • ... and whatever it is needed
  • VOs can easily extend the content of the toolbox, adding more commands
  • different computing resource types can run different pilots

The proposed solution requires that each command follows a coding convention, and can be found in a specific part of the code, e.g. "DIRAC.WorkloadManagementSystem.Command.MyCommand":

class MyCommand(object):
    def do():
        """ Here, we specify what the command does
        """

For each computing resource, VOs can specify a pilot type, by simply modifying a description field (here represented as key-value pair in a dictionary), e.g.:

{ 
  'CREAM': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent'],
  'CLOUD': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunMonitoringAgent', 'RunJobAgent', 'UploadPilotOutput'],
  'BOINC': ['InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent', 'UploadPilotOutput'],
}

Server side, or client side, a pilot script is generated, e.g. (nota bene: this code is just a stub):

def generatePilot( commandsList ):

   for command in commandsList: 
        try:
            __import__("%sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command ))
            importLine = "from %sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command )
        except ImportError:
            __import__("DIRAC.WorkloadManagementSystem.Command.%s" % command )
            importLine = "from DIRAC.WorkloadManagementSystem.Command.%s" % command

        addToScript = importLine + "%s().do()" %command

For Grid and IaaS typed of resources, once the pilot script is generated, it is then added, by the TaskQueue/Site/Cloud director to the list of files that compose the pilot tarball.

Clone this wiki locally