-
Notifications
You must be signed in to change notification settings - Fork 25
Home
The DIRAC (Distributed Infrastructure with Remote Agent Control) project is a complete Grid, Cloud, Host and Volunteer solution for a community of users such as the LHCb Collaboration, Belle Collaboration or NGI multi-VO portals (FranceGrilles, Ibergrid). DIRAC forms a layer between a particular community and various compute resources to allow optimized, transparent and reliable usage.
A more detailed description of the DIRAC system can be found at this DIRAC system
The DIRAC Workload Management system realizes the task scheduling paradigm with Generic Pilot Jobs. This task scheduling method solves many problems of using unstable distributed computing resources which are available in computing distributed infrastructures. In particular, it helps the management of the user activities in large Virtual Organizations such as LHC experiments. In more details the DIRAC WMS with Pilot Jobs is described in this DIRAC pilots model
You can have a look to new adopters (providers and user) questions at VMDIRAC FAQs
VMDIRAC is the DIRAC extension to integrate Federated Clouds in a transparent manner to the user. You can install both DIRAC core and VMDIRAC extension with:
wget --no-check-certificate -O dirac-install 'https://github.com/DIRACGrid/DIRAC/raw/integration/Core/scripts/dirac-install.py'
su dirac -c'python dirac-install -V "VMDIRAC"'
See DIRAC server installation detailed procedure. Onces you have configured a DIRAC Configuration Server (CS) instance, you can configure a VMDIRAC extension.
Notes that will lead you through the installation steps of VMDIRAC.
VMDIRAC is based upon the two following packages:
they both need to be installed prior attempting VMDIRAC installation.
SL6 platform notes: There are no package, from sources:
See OpenNebula install opennebula-3.4.0.tar.gz
currently we are maintaining opennebula end-points > 3.4.0
Installing only the client on the VMDIRAC server:
./install.sh -c
NEW feautre in v0r8 including X509 authentication and generic ssh contextualization supported rOCCI client is >4.1.0
gem install rake
gem install occi-cli
SL5 platform notes: Package incompatibility, manually installation instructions
Standard SL5 dependencies:
# rpm -qa|grep ruby
ruby-libs-1.8.5-29.el5_9.i386
ruby-1.8.5-29.el5_9.x86_64
ruby-libs-1.8.5-29.el5_9.x86_64
rubygems-1.3.1-1.el5.noarch
ruby-shadow-1.4.1-7.el5.x86_64
ruby-rdoc-1.8.5-29.el5_9.x86_64
ruby-devel-1.8.5-29.el5_9.i386
ruby-irb-1.8.5-29.el5_9.x86_64
ruby-augeas-0.3.0-1.el5.x86_64
libselinux-ruby-1.33.4-5.7.el5.x86_64
ruby-devel-1.8.5-29.el5_9.x86_64
occi needs ruby-1.9.3, in SL5 one can install and use Ruby enVironment Manager to setup a configurable Ruby environment, additional info at rvm homepage
Onces rvm is installed, then go to the dirac bashrc and add:
# RVM
source ~/.rvm/scripts/rvm
rvm use 1.9.3
Nova 1.1 (OpenStack) python libraries:
pip install apache-libcloud
NEW feautre v0r9 (development), currently need trunk libcloud, if you have a previous libcloud folder move it to libcloud.bak and install from trunk:
cd /opt
git clone https://github.com/alvarolopez/libcloud.git
cd libcloud
python setup.py clean
python setup.py build
python setup.py install
If you want generic VM contextualization for any image with sshd available, then you need to install Paramiko
Paramiko:
pip install paramiko
Install the components either from the sysadmin tool or directly from the machine.
install DB WorkloadManagement/VirtualMachineDB
this will create a new DB on the MySQL server ( without tables ! )
install service WorkloadManagement/VirtualMachineManager
be careful with the port, if it is taken by other service you may want to update it to avoid collisions. Check the VMDIRAC.WorkloadManagement.ConfigTemplate for further information. Furthermore, running the service will generate the necessary tables in the database. This service is going to be contacted by the Web server and ALL the virtual machines. You may expect some load here, depending on the number of VMs running.
install agent WorkloadManagement/VirtualMachineScheduler
this agent is the one taking care of booting the virtual machine acording to needs
install agent WorkloadManagement/VirtualMachineContextualization
optional agent when using ssh contextualization method
nothing to do if the extension is properly declared on dirac.cfg
The main VMDIRAC setup is concerning to Image and Contextualization management There are three major sections to setup at DIRAC Configuration Server:
VMDIRAC defines the Running Pod as a logical abstraction of a particular running conditions. A Running Pod is matching an Image with the corresponding cloud end-point list to run VMs of such Image.
VMDIRAC concept of an Image, is including a boot image and optionally the contextualization of such image.
A cloud manager has at least one end-point in some API (f.e. OCCI, EC2 or native APIs). An End-point section has all the specific values for the use of a cloud manager end-point.
Ready to run without dynamic contextualization, this image has to be prepared to run in a specific Endpoint and a particular DIRAC configuration.
Image and Endpoint context is automatically configured by VMDIRAC. Therefore, a single "golden" image can be distributed to all the Endpoints.
It is the High Energy Physics contextualization approach, using CernVM images and contextualization methods supported by OpenStack and OpenNebula. This CernVM approach can also be used with other scientific applications. Currently VMDIRAC allows the following HEPiX methods:
DIRAC image context is included in an ISO context image, which has to be previously upload to the IaaS provider to be mounted by the CernVM init process. The end-point context is passed to the VM at submission time. VMDIRAC get the parameters from the corresponding end-point section and set this environment using the OpenNebula context section, creating an on-the-fly ISO image, then CernVM mounts it and loads the end-point context.
DIRAC image context is provided by amiconfig tools, sending the scripts in nova 1.1 userdata. End-point context is provided through nova 1.1 metadata, which is specific for each Open- Stack IaaS end-point and selected on submission time from the DIRAC Configuration Server.
Instead of a golden image depending on the CernVM platform, VMDIRAC also supports a generic golden image, which can be configured using a ssh contextualization, if an in-bound connectivity is available in the VM for ssh and sftp operations.
VMDIRAC can be configured with different policies for the creation and stoppage of the VMs. Each end-point has associated a VM allocation policy (vmPolicy) and a VM stoppage policy (vmStopPolicy).
The VM allocation policy can be elastic or static.
static VM allocation is used when a IaaS provider defines a constant number of VM slots that can be accessed.
The elastic allocation is used to create new VMs when there are jobs queued in DIRAC. For this purposed the Running Pod configuration section has the CPUPerInstance option, which defines the minimal overall CPU of the DIRAC jobs waiting in the task queued to submit a new VM. The parameter is used for the tuning of the VM delivery elasticity. Therefore, a CPUPerInstance can be set to a longer time to use the available resources in a more efficient manner, saving creation overheads, and to a shorter time to setup an exhaustive use of the available resources aiming to finish the production in a shorter total wall time, but with higher resource costs due to additional overhead.
In regular basis, CPUPerInstance references, from shorter to longer values, could be defined to:
######a) Zero to submit a new VM with no minimal CPU in the jobs of the tasks queue. ######b) A longer value could be the average required CPU of the jobs as a compromise solution between VM efficiency and total wall time. ######c) A very large value to maximize the efficiency in terms of VM creation overhead, for the cases where the production total wall time is not a constrain.
The VM stoppage policy can be setup to elastic or never. Anyway, VMs can be stopped by the VM operator or by the HEPiX stoppage using CernVM images in responsibility of each IaaS provider. If a running VM is required to be stopped, then the VM orderly stops, then halting the VM.
The VM stoppage is not depending on jobs running, only by external stoppage request (VM operators at VMDIRAC portal, Iaas provider)
Elastic policy stops the VM if there are no more jobs running in the last VM halting margin time, which is an option to be setup.
[v0r8]
NEW: rOCCI 1.1 DIRAC driver.
rOCCI authentication by X509 proxies with third-party VOMS (rOCCI do the matching work in a transparent manner)
Generic SSH contextualization DIRAC client to all Operating System Images and Cloud Managers (software convergence).
In current release, SSH contextualization has been tested for OpenNebula and OpenStack.
NEW: VM local dirac.cfg updater agent for pilot/dirac release is updated
CHANGE: OcciImage and Occi09 migrated to new WMS/Utilities style
[v0r7]
NEW: endpoint vmPolicy "static" -> slots driven, indicated by maxEndpointInstances, endpoint CS parameter
endpoint vmPolicy "elastic" -> jobs driven, one by one
NEW: endpoint vmStopPolicy "never"
endpoint vmStopPolicy "elastic" -> no more jobs + VM halting margin time
CHANGE: Both cases: VMs can be stoped from "Instances Overview" screen: VirtualMachineDB.py in function instanceIDHeartBeat:
Send empty dir just in case we want to send flags (such as stop vm)
TODO: The particular HEPiX method to ask VMs to shootdown from the IaaS site provider site (requirement to be specified by HEPiX)
[v0r6]
NEW: nova-1.1 driver and ssh contextualization method, ready to extend to amiconfig contextualization.
[v0r5]
Multi-endpoint OpenNebula and CloudStack in a fed way. Running-pads, DIRAC-Images and Endpoints scheme.
[v0r4]
FIX: Added missing occi and cloud director files
[v0r3]
NEW: Redesign of VMDirector to allow for more Cloud provider drivers without modifications to the VM core components.
NEW: An image is componed of a bootstrap image and context files. Optionally a context image can be included also.
CHANGE: A CS Cloud Endpoint has all the contextualization and configuration parameters for a specific endpoint
CHANGE: A running Pod contains a DIRAC image, a list of endpoings and the necessary parameters and requirements.
[v0r2]
Initial version of VMDIRAC. It includes support for Occi and Amazon clouds.
The VMDIRAC RFCs are descriptions of proposals for new functionalities which are exposed by the authors to the VMDIRAC development team for comments. The RFC description is maintained and updated in the corresponding VMDIRAC Wiki page. Each new RFC must have a distinct number by which it can be referred to, the author and the date of the first submission.
RFC #1: Renewal proxy for the VMs instead of a cert (pub,key)