-
Notifications
You must be signed in to change notification settings - Fork 175
Pilots 2.0: generic, configurable pilots
- Authors: F.Stagni, C.Luzzi
- Created: 02.05.2014
This is a proposal for a refactoring of the "pilots" code. The goal of this proposal is, first of all, to make today's pilots easy to configure, and easy to extend.
We start with a definition of pilot. A pilot is what creates the possibility to run jobs on a worker node. A pilot can be sent, as a script to be run. Or, it can be fetched. A pilot can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines in the form of contextualization script, or IAAC (Infrastructure as a Client) provided that these machines are properly configured.
A pilot has, at a minimum, to:
- install DIRAC
- configure DIRAC
- run the JobAgent
A pilot has to run on each and every computing resource type.
The current solution lacks extensibility, and maintainability. Its building block is the script dirac-pilot.py, which uses the script dirac-install.py for the installation. Communities can define their own pilot scripts, but it is currently impossible to:
- add capabilities to the current pilot while inheriting from the base one
- define different pilots based on the type of the computing resource.
We propose a solution where:
- The pilot script can be generated at runtime.
- The pilot script can be generated server side, for every resource embracing the Grid model (e.g. CREAM, ARC CEs) computing , and the IaaS (Infrastructure as a Service) model (e.g. clouds) in the form of contextualization script. In DIRAC terminology this is true for every resource for which a "Director" is needed.
- The pilot script can be generated client side. This is necessary for every resource embracing the IaaC (Infrastructure as a Client) model.
- A toolbox of pilots capabilities (that we will call "commands") is available for generating the pilot script
- Each command implements a single, atomic, functions, e.g.:
- run tests
- install DIRAC
- configure DIRAC
- run JobAgent
- run monitoring agent
- report usage
- ... and whatever it is needed
- VOs can easily extend the content of the toolbox, adding more commands
- different computing resource types can run different pilots
The proposed solution requires that each command follows a coding convention, and can be found in a specific part of the code, e.g. "DIRAC.WorkloadManagementSystem.Command.MyCommand":
class MyCommand(object):
def do():
""" Here, we specify what the command does
"""
For each computing resource, VOs can specify a pilot type, by simply modifying a description field (here represented as key-value pair in a python dictionary), e.g.:
{
'CREAM': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunJobAgent'],
'CLOUD': ['RunEnvironmentTest', 'InstallDIRAC', 'ConfigureDIRAC', 'RunMonitoringAgent', 'RunJobAgent', 'UploadPilotOutput'],
'BOINC': ['ConfigureDIRAC', 'RunJobAgent', 'UploadPilotOutput'],
'VAC': ['ConfigureDIRAC', 'RunMonitoringAgent', 'RunJobAgent', 'UploadPilotOutput'],
}
This information can be persisted in the Configuration System, for easy modification, while a default will be provided.
Server side, or client side, a pilot script is generated, e.g. (nota bene: this code is just a stub):
def generatePilot( commandsList ):
for command in commandsList:
try:
__import__("%sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command ))
importLine = "from %sDIRAC.WorkloadManagementSystem.Command.%s" %( self.vo, command )
except ImportError:
__import__("DIRAC.WorkloadManagementSystem.Command.%s" % command )
importLine = "from DIRAC.WorkloadManagementSystem.Command.%s" % command
addToScript = importLine + "%s().do()" %command
For Grid and IaaS typed of resources, once the pilot script is generated, it is then added, by the TaskQueue/Site/Cloud director to the list of files that compose the pilot tarball.
For IaaC, the "generatePilot(flavor)" function becomes the pilot itself. Then, its result can be written on disk and run. This would maintain the same exact way of running as per other resources. But, to reach that point, you first need your machines to have a partial installation.
The "generatePilot(flavor)" function has then to be coded in a way that can be run in a virgin environment, to avoid incurring in a chicken-and-egg problem.