Skip to content
Vanessa Hamar edited this page Jan 26, 2015 · 10 revisions

4. Job Management Basics

In order to submit a job to DIRAC you need to describe what your job requirements. There are several requirements you can specify, for instance, the name of the executable program you want to execute, the location of the output of this executable, etc.

For describing your job requirements you will use a job description language (JDL) specific to DIRAC. In this page you will find example of jobs and how you can use DIRAC to submit them for execution, how to track their status and how to retrieve their results.

4.1 Getting started: simple job

4.1.1 Submit your job

The following is the description of a simple job which only purpose is to list the contents of the working directory. In your favorite text editor, create a file and name it Simple.jdl with the contents below:

JobName       = "Simple_Job";
Executable    = "/bin/ls";
Arguments     = "-ltr";
StdOutput     = "StdOut";
StdError      = "StdErr";
OutputSandbox = {"StdOut","StdErr"};

This is a description (using JDL syntax) of a simple job. In it, you ask DIRAC to select an execution site (among the ones it can use for executing your jobs) and to execute the file /bin/ls with arguments -ltr. The results of this execution (i.e. the standard output and standard error of the job) will be recorded in the files StdOut and StdErr respectively.

To actually submit the job to DIRAC do:

$ dirac-wms-job-submit Simple.jdl
JobID = 123

Note that in this example we are assuming that the executable /bin/ls is already present in the machine where DIRAC will execute your job. This is indeed the case because we are using a system command which is present in all Unix machines.

If the submission is successful, DIRAC prints the identifier it has assigned to your job (123 in this example). You will use this identifier for retrieving the status of your job and the job results when its execution is finished. More details on this below.

As with any other DIRAC command, you can use the option --help to get more information on the flags accepted by that command, for instance:

$ dirac-wms-job-submit --help
Submit jobs to DIRAC WMS
Usage:
   dirac-wms-job-submit [option|cfgfile] ... JDL ...
Arguments:
   JDL:      Path to JDL file
General options:
  -o:  --option=         : Option=value to add
  -s:  --section=        : Set base section for relative parsed options
  -c:  --cert=           : Use server certificate to connect to Core Services
  -d   --debug           : Set debug mode (-dd is extra debug)
  -h   --help            : Shows this help

For an exhaustive list of the JDL syntax and examples of usage please refer to DIRAC's Job Description Language Reference.

4.1.2 Query the execution status of your job

Once your job is successfully submitted, DIRAC will queue it up for execution. Eventually, it will be sent to the execution site where it will be launched. To know the current status of your job, use the command:

$ dirac-wms-job-status 123
JobID=123 Status=Waiting; MinorStatus=Pilot Agent Submission; Site=ANY;

Here 123 is the job identifier. This number is displayed when you have successfully submitted your job with the command dirac-wms-job-submit. The Status=Waiting means that your job is queued waiting to be put in execution in any of the sites DIRAC is configured with (Site=ANY).

4.1.3 Retrieve the output of your job

Once your job has finished its execution, you can retrieve its output, that is, the set of files produced as a result of the execution of your job. For this purpose, use the command:

$ dirac-wms-job-get-output 123
Job output sandbox retrieved in /current/working/directory/123/

DIRAC will create a directory in your current working directory named after the job identifier (123 in this case) and put all the relevant files inside. You can also use the --Dir option of this command to tell DIRAC where you want your job's output to be stored. See dirac-wms-job-get-output --help for details.

In order to know when your job has finished, so that you can retrieve its results, you can use the command:

$ dirac-wms-job-status 123
JobID=123 Status=Done; MinorStatus=Execution Complete; Site=EGI.KEK.jp;

When this command displays Status=Done the job has completed its execution and its output is ready for retrieval.

4.2 A more realistic case: jobs with input and output sandbox

In most situations, you will want to execute your own programs which are located in your computer. You can ask DIRAC to transfer those executable files together with your job to the execution site. You can do this by specifying what is called a sandbox: a container of files you want to be transferred to or from the site where you job actually executes. The input sandbox is the set of files you want DIRAC to transfer to the execution site and the output sandbox the files you want to retrieve when your job has finished its execution (typically, those files are produced by your job).

This example shows how to submit a job for executing your own executable file (a shell script in this particular example).

  • Create the job description in file InputAndOuputSandbox.jdl with contents as shown below:

    JobName       = "InputAndOuputSandbox";
    Executable    = "testJob.sh";
    StdOutput     = "StdOut";
    StdError      = "StdErr";
    InputSandbox  = {"testJob.sh"};
    OutputSandbox = {"StdOut","StdErr"};
    

Using the attribute InputSandbox we ask DIRAC to transfer the local file named testJob.sh up to the site where this job will be launched for execution. testJob.sh is a file in your computer that must exist so that we can submit the job.

  • Create the file testJob.sh with contents:

    #!/bin/sh
    /bin/hostname
    /bin/date
    /bin/ls -la
    

It is strongly recommended you test locally in your computer that the executable you are going to submit to DIRAC actually works.

  • Now you can submit this job with the command:

    $ dirac-wms-job-submit InputAndOuputSandbox.jdl
    

4.3 Jobs with Input and Output Data

In case where the data, programs, etc are stored in a Grid Storage Element, it can be specified as part of InputSandbox or InputData. InputSandbox can be declared as a list, separated by commas with each file between "".

Before the grid file can be used, it should be uploaded first to the Grid. This is done using the following command:

dirac-dms-add-file <LFN> <local_file> SE

For example:

bash-4.1$ dirac-dms-add-file /vo.france-asia.org/user/v/vhamar/TestPekin-1.txt Test-Pekin.txt CPPM-disk -o LogLevel=INFO
Uploading /vo.france-asia.org/user/v/vhamar/TestPekin-1.txt
putAndRegister: Checksum information not provided. Calculating adler32.
putAndRegister: Checksum calculated to be e4be4920.
__putFile: Executing transfer of file:Test-Pekin.txt to srm://marsedpm.in2p3.fr:8446/srm/managerv2?SFN= /dpm/in2p3.fr/home/vo.france-asia.org/user/v/vhamar/TestPekin-1.txt using 1 streams
__putFile: Successfully put file to storage.
putAndRegister: Sending accounting took 1.2 seconds
Successfully uploaded file to CPPM-disk
  • Use the same testJob.sh shell script as in the previous exercise.

  • In the JDL we have to add OutputSE and OutputData:

    JobName = "LFNInputSandbox";
    Executable = "testJob.sh";
    StdOutput = "StdOut";
    StdError = "StdErr";
    InputSandbox = {"testJob.sh","LFN:/vo.france-asia.org/user/v/vhamar/test.txt"};
    OutputSandbox = {"StdOut","StdErr"};
    OutputSE = "CPPM-disk";
    OutputData = {"StdOut"};
    
  • After creation of JDL file the next step is submit a job, using the command:

    dirac-wms-job-submit <JDL>
    

    The same effect can be achieved with the following JDL LFNInputData.jdl:

    JobName = "LFNInputData";
    Executable = "testJob.sh";
    StdOutput = "StdOut";
    StdError = "StdErr";
    InputSandbox = {"testJob.sh"};
    InputData = {"LFN:/vo.france-asia.org/user/v/vhamar/test.txt"};
    OutputSandbox = {"StdOut","StdErr"};
    OutputSE = "CPPM-disk";
    OutputData = {"StdOut"};
    

An important difference of specifying input data as InputSandbox or InputData is that in the first case the data file is always downloaded local to the job running in the Grid. In the InputData case, the file can be either downloaded locally or accessed remotely using some remote acces protocol, e.g. rfio or dcap, depending on the policies adopted by your Virtual Organization.

4.4 Managing Jobs

4.4.1 Submitting a Job

  • After creating the JDL file the next step is to submit a job using the command:

    dirac-wms-job-submit <JDL>
    

    For example:

    bash-3.2$ dirac-wms-job-submit Simple.jdl -o LogLevel=INFO
    2010-10-17 15:34:36 UTC dirac-wms-job-submit.py/DiracAPI  INFO: <=====DIRAC v5r10-pre2=====>
    2010-10-17 15:34:36 UTC dirac-wms-job-submit.py/DiracAPI  INFO: Will submit job to WMS
    JobID = 11
    

    In the output of the command you get the DIRAC job ID which is a unique job identifier. You will use it later for other job operations.

4.4.2 Getting the job status

  • The next step is to monitor the job status using the command:

    dirac-wms-job-status <Job_ID>
    
    bash-3.2$ dirac-wms-job-status 11
    JobID=11 Status=Waiting; MinorStatus=Pilot Agent Submission; Site=ANY;
    

4.4.3 Retrieving the job output

  • And finally, after the job achieves status Done, you can retrieve the job Output Sandbox:

    dirac-wms-job-get-output [--dir output_directory] <Job_ID>
    

4.4.4 COMDIRAC job management

  • To continue with COMDIRAC commands, follow this link:

https://github.com/DIRACGrid/COMDIRAC/wiki/Job-Management

Clone this wiki locally