Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.
Víctor Méndez edited this page Oct 8, 2013 · 81 revisions

=============================================================== VMDIRAC is an extension of DIRAC

The DIRAC (Distributed Infrastructure with Remote Agent Control) project is a complete Grid, Cloud, Host and Volunteer solution for a community of users such as the LHCb Collaboration, Belle Collaboration or NGI multi-VO portals (FranceGrilles, Ibergrid). DIRAC forms a layer between a particular community and various compute resources to allow optimized, transparent and reliable usage.

DIRAC Documentation

DIRAC Overview

A more detailed description of the DIRAC system can be found at this DIRAC system

The DIRAC Workload Management system realizes the task scheduling paradigm with Generic Pilot Jobs. This task scheduling method solves many problems of using unstable distributed computing resources which are available in computing distributed infrastructures. In particular, it helps the management of the user activities in large Virtual Organizations such as LHC experiments. In more details the DIRAC WMS with Pilot Jobs is described in this DIRAC pilots model

You can have a look to new adopters (providers and user) questions at VMDIRAC FAQs

===================================== VMDIRAC Extension

VMDIRAC is the DIRAC extension to integrate Federated Clouds in a transparent manner to the user. You can install both DIRAC core and VMDIRAC extension with:

wget --no-check-certificate -O dirac-install 'https://github.com/DIRACGrid/DIRAC/raw/integration/Core/scripts/dirac-install.py'
su dirac -c'python dirac-install -V "VMDIRAC"'

See DIRAC server installation detailed procedure. Onces you have configured a DIRAC Configuration Server (CS) instance, you can configure a VMDIRAC extension.


VMDIRAC installation instructions

Notes that will lead you through the installation steps of VMDIRAC.

DIRAC Requirements

VMDIRAC is based upon the two following packages:

they both need to be installed prior attempting VMDIRAC installation.

Externals

OpenNebula

OCCI 0.8 (OpenNebula) client installation

OpenNebula install

SL6 platform notes: There are no package, from sources:

See OpenNebula install opennebula-3.4.0.tar.gz

currently we are maintaining opennebula end-points > 3.4.0

Installing only the client on the VMDIRAC server:

./install.sh -c
rOCCI 1.1 (OpenNebula) installation

NEW feautre in v0r8 including X509 authentication and generic ssh contextualization supported rOCCI client is >4.1.0

gem install rake
gem install occi-cli 

SL5 platform notes: Package incompatibility, manually installation instructions

Standard SL5 dependencies:

# rpm -qa|grep ruby
ruby-libs-1.8.5-29.el5_9.i386
ruby-1.8.5-29.el5_9.x86_64
ruby-libs-1.8.5-29.el5_9.x86_64
rubygems-1.3.1-1.el5.noarch
ruby-shadow-1.4.1-7.el5.x86_64
ruby-rdoc-1.8.5-29.el5_9.x86_64
ruby-devel-1.8.5-29.el5_9.i386
ruby-irb-1.8.5-29.el5_9.x86_64
ruby-augeas-0.3.0-1.el5.x86_64
libselinux-ruby-1.33.4-5.7.el5.x86_64
ruby-devel-1.8.5-29.el5_9.x86_64

occi needs ruby-1.9.3, in SL5 one can install and use Ruby enVironment Manager to setup a configurable Ruby environment, additional info at rvm homepage

Onces rvm is installed, then go to the dirac bashrc and add:

# RVM
source ~/.rvm/scripts/rvm
rvm use 1.9.3

OpenStack

OpenStack simple user/password auth

Nova 1.1 (OpenStack) python libraries:

pip install apache-libcloud
OpenStack with X509 VOMS auth

NEW feautre v0r9 (development), currently need trunk libcloud, if you have a previous libcloud folder move it to libcloud.bak and install from trunk:

Nova 1.1 (OpenStack) libcloud with X509:
cd /opt
git clone https://github.com/alvarolopez/libcloud.git
cd libcloud
python setup.py clean
python setup.py build
python setup.py install

ssh contextualization dependencies

If you want generic VM contextualization for any image with sshd available, then you need to install Paramiko

Paramiko:

pip install paramiko

VMDIRAC install step by step

Install the components either from the sysadmin tool or directly from the machine.

DB : VirtualMachineDB

install DB WorkloadManagement/VirtualMachineDB

this will create a new DB on the MySQL server ( without tables ! )

Service : VirtualMachineManagerHandler

install service WorkloadManagement/VirtualMachineManager

be careful with the port, if it is taken by other service you may want to update it to avoid collisions. Check the VMDIRAC.WorkloadManagement.ConfigTemplate for further information. Furthermore, running the service will generate the necessary tables in the database. This service is going to be contacted by the Web server and ALL the virtual machines. You may expect some load here, depending on the number of VMs running.

Agent : VirtualMachineScheduler

install agent WorkloadManagement/VirtualMachineScheduler

this agent is the one taking care of booting the virtual machine acording to needs

Agent : VirtualMachineContextualization

install agent WorkloadManagement/VirtualMachineContextualization

optional agent when using ssh contextualization method

Web : VMDIRAC.Web

nothing to do if the extension is properly declared on dirac.cfg


VMDIRAC setup overview

The main VMDIRAC setup is concerning to Image and Contextualization management There are three major sections to setup at DIRAC Configuration Server:

Running Pods:

VMDIRAC defines the Running Pod as a logical abstraction of a particular running conditions. A Running Pod is matching an Image with the corresponding cloud end-point list to run VMs of such Image.

Images:

VMDIRAC concept of an Image, is including a boot image and optionally the contextualization of such image.

End-points:

A cloud manager has at least one end-point in some API (f.e. OCCI, EC2 or native APIs). An End-point section has all the specific values for the use of a cloud manager end-point.

Images setup to run DIRAC Virtual Machines

1 ad-hoc image

Ready to run without dynamic contextualization, this image has to be prepared to run in a specific Endpoint and a particular DIRAC configuration.

2 golden image and dynamic contextualization

Image and Endpoint context is automatically configured by VMDIRAC. Therefore, a single "golden" image can be distributed to all the Endpoints.

2.1 HEPiX contextualization

It is the High Energy Physics contextualization approach, using CernVM images and contextualization methods supported by OpenStack and OpenNebula. This CernVM approach can also be used with other scientific applications. Currently VMDIRAC allows the following HEPiX methods:

2.1.1 HEPiX - OpenNebula

DIRAC image context is included in an ISO context image, which has to be previously upload to the IaaS provider to be mounted by the CernVM init process. The end-point context is passed to the VM at submission time. VMDIRAC get the parameters from the corresponding end-point section and set this environment using the OpenNebula context section, creating an on-the-fly ISO image, then CernVM mounts it and loads the end-point context.

2.1.2 HEPiX - OpenStack

DIRAC image context is provided by amiconfig tools, sending the scripts in nova 1.1 userdata. End-point context is provided through nova 1.1 metadata, which is specific for each Open- Stack IaaS end-point and selected on submission time from the DIRAC Configuration Server.

2.2 ssh contextualization

Instead of a golden image depending on the CernVM platform, VMDIRAC also supports a generic golden image, which can be configured using a ssh contextualization, if an in-bound connectivity is available in the VM for ssh and sftp operations.

VM horizontal auto-scaling setup

VMDIRAC can be configured with different policies for the creation and stoppage of the VMs. Each end-point has associated a VM allocation policy (vmPolicy) and a VM stoppage policy (vmStopPolicy).

VM allocation policy

The VM allocation policy can be elastic or static.

vmPolicy = static

static VM allocation is used when a IaaS provider defines a constant number of VM slots that can be accessed.

vmPolicy = elastic

The elastic allocation is used to create new VMs when there are jobs queued in DIRAC. For this purposed the Running Pod configuration section has the CPUPerInstance option, which defines the minimal overall CPU of the DIRAC jobs waiting in the task queued to submit a new VM. The parameter is used for the tuning of the VM delivery elasticity. Therefore, a CPUPerInstance can be set to a longer time to use the available resources in a more efficient manner, saving creation overheads, and to a shorter time to setup an exhaustive use of the available resources aiming to finish the production in a shorter total wall time, but with higher resource costs due to additional overhead.

In regular basis, CPUPerInstance references, from shorter to longer values, could be defined to:

######a) Zero to submit a new VM with no minimal CPU in the jobs of the tasks queue. ######b) A longer value could be the average required CPU of the jobs as a compromise solution between VM efficiency and total wall time. ######c) A very large value to maximize the efficiency in terms of VM creation overhead, for the cases where the production total wall time is not a constrain.

VM stoppage policy

The VM stoppage policy can be setup to elastic or never. Anyway, VMs can be stopped by the VM operator or by the HEPiX stoppage using CernVM images in responsibility of each IaaS provider. If a running VM is required to be stopped, then the VM orderly stops, then halting the VM.

vmStopPolicy = never

The VM stoppage is not depending on jobs running, only by external stoppage request (VM operators at VMDIRAC portal, Iaas provider)

vmStopPolicy = elastic

Elastic policy stops the VM if there are no more jobs running in the last VM halting margin time, which is an option to be setup.


VMDIRAC Release information:

[v0r8]
NEW: rOCCI 1.1 DIRAC driver.
     rOCCI authentication by X509 proxies with third-party VOMS (rOCCI do the matching work in a transparent manner)
     Generic SSH contextualization DIRAC client to all Operating System Images and Cloud Managers (software convergence).
     In current release, SSH contextualization has been tested for OpenNebula and OpenStack.
NEW: VM local dirac.cfg updater agent for pilot/dirac release is updated
CHANGE: OcciImage and Occi09 migrated to new WMS/Utilities style

[v0r7]
NEW: endpoint vmPolicy "static" -> slots driven, indicated by maxEndpointInstances, endpoint CS parameter
     endpoint vmPolicy "elastic" -> jobs driven, one by one

NEW: endpoint vmStopPolicy "never"
     endpoint vmStopPolicy "elastic" -> no more jobs + VM halting margin time
CHANGE: Both cases: VMs can be stoped from "Instances Overview" screen: VirtualMachineDB.py in function instanceIDHeartBeat:
Send empty dir just in case we want to send flags (such as stop vm)
TODO: The particular HEPiX method to ask VMs to shootdown from the IaaS site provider site (requirement to be specified by HEPiX)

[v0r6]

NEW: nova-1.1 driver and ssh contextualization method, ready to extend to amiconfig contextualization.

[v0r5]

Multi-endpoint OpenNebula and CloudStack in a fed way. Running-pads, DIRAC-Images and Endpoints scheme.

[v0r4]

FIX: Added missing occi and cloud director files

[v0r3]

NEW: Redesign of VMDirector to allow for more Cloud provider drivers without modifications to the VM core components.
NEW: An image is componed of a bootstrap image and context files. Optionally a context image can be included also.
CHANGE: A CS Cloud Endpoint has all the contextualization and configuration parameters for a specific endpoint
CHANGE: A running Pod contains a DIRAC image, a list of endpoings and the necessary parameters and requirements.

[v0r2]

Initial version of VMDIRAC. It includes support for Occi and Amazon clouds.

RFC:

The VMDIRAC RFCs are descriptions of proposals for new functionalities which are exposed by the authors to the VMDIRAC development team for comments. The RFC description is maintained and updated in the corresponding VMDIRAC Wiki page. Each new RFC must have a distinct number by which it can be referred to, the author and the date of the first submission.

The current RFCs:

RFC #1: Renewal proxy for the VMs instead of a cert (pub,key)

Clone this wiki locally