-
Notifications
You must be signed in to change notification settings - Fork 176
Pilots 2.0: generic, configurable pilots
This is a proposal for a refactoring of the "pilots" code. The goal of this proposal is, first of all, to make today's pilots easy to configure, and easy to extend.
We start with a definition of pilot. A pilot is what creates the possibility to run jobs on a worker node. A pilot can be sent, as a script to be run. Or, it can be fetched. A pilot can run on every computing resource, e.g.: CREAM Computing element, DIRAC Computing element, Virtual Machine in the form of contextualization script.
A pilot has, at a minimum, to:
- install DIRAC
- configure DIRAC
- run the JobAgent
A pilot has to run on each and every computing resource type.
The current solution lacks extensibility, and maintainability. Its building block is the script dirac-pilot.py, which uses the script dirac-install.py for the installation. Communities can define their own pilot scripts, but it is currently impossible to:
- add capabilities to the current pilot while inheriting from the base one
- define different pilots based on the type of the computing resource.
We propose a solution where:
- The pilot script can be generated at runtime.
- The pilot script can be generated server side, for every resource embracing the Grid model, and the IaaS (Infrastructure as a Service) model (e.g. clouds). In DIRAC terminology this is true for every resource for which a "Director" is needed.
- The pilot script can be generated client side. This is necessary for every resource embracing the IaaC (Infrastructure as a Client) model.
- A toolbox of pilots capabilities (that we will call "commands") is available for generating the pilot script
- each command implements a single, atomic, functions, e.g.:
- run tests
- install DIRAC
- configure DIRAC
- run JobAgent
- run monitoring agent
- report usage
- ... and whatever it is needed
- VOs can easily extend the content of the toolbox, adding more commands
- different computing resource types can run different pilots
The proposed solution requires that each command follows a coding convention, and can be found in a specific part of the code, e.g. "DIRAC.WorkloadManagementSystem.Command.MyCommand":
class MyCommand(object):
def do():
""" Here, we specify what the command does
"""
For each computing resource, VOs can specify a pilot type, by simply modifying a description field (here represented as key-value pair in a dictionary), e.g.:
{
'CREAM': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent'],
'CLOUD': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunMonitoringAgent', 'RunJobAgent', 'UploadPilotOutput'],
'BOINC': ['InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent', 'UploadPilotOutput'],
}
Server side, or client side, a pilot script is generated, e.g. (nota bene: this code is just a stub):
def generatePilot( commandsList ):
for command in commandsList:
try:
__import__("%sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command ))
importLine = "from %sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command )
except ImportError:
__import__("DIRAC.WorkloadManagementSystem.Command.%s" % command )
importLine = "from DIRAC.WorkloadManagementSystem.Command.%s" % command
addToScript = importLine + "%s().do()" %command
For Grid and IaaS typed of resources, once the pilot script is generated, it is then added, by the TaskQueue/Site/Cloud director to the list of files that compose the pilot tarball.