Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

Job Management

Andrei Tsaregorodtsev edited this page Jan 26, 2015 · 15 revisions

New in v0r4

COMDIRAC job management commands are designed to ease day to day use of the DIRAC Workload Management System. Many of these commands are inspired from cluster batch systems (such as Torque or Slurm).

Job submission

After Session Initialization, you have a valid local proxy that allows you to connect to DIRAC servers.

The dsub command has different semantics as described in the following. Let'submit your first COMDIRAC job with the dsub command:

$ dsub /bin/hostname
6723938

The dsub command has done two things:

  • build a JDL formatted file describing your job
  • submit this JDL to the DIRAC WMS, along with needed files (executable, eventually input sanbox, ...)

When the job is submitted, dsub prints its JobID on the terminal.

You can ask the server to print a list of active jobs with the dstat command:

$ dstat
 JobID  Owner JobName    OwnerGroup    JobGroup Site Status       MinorStatus         SubmissionTime    
------- ----- ------- ---------------- -------- ---- ------- ---------------------- ------------------- 
6723938 pgay  Unknown frangrilles_user NoGroup  ANY  Waiting Pilot Agent Submission 2013-10-26 15:44:31 

Here, we see the only active job we just submitted described by a list of fields.

If you don't see any job in the list, it is probably because the job we just launched is already finished. Actually, since we sent a fairly simple command, it may be finished shortly anyway. To print inactive jobs too, you can add the -a option to dstat:

 JobID  Owner JobName    OwnerGroup    JobGroup      Site       Status    MinorStatus       SubmissionTime    
------- ----- ------- ---------------- -------- --------------- ------ ------------------ ------------------- 
6723823 pgay  Unknown frangrilles_user NoGroup   LCG.LPNHE.fr    Done  Execution Complete 2013-10-26 15:39:44 
6723938 pgay  Unknown frangrilles_user NoGroup  LCG.DATAGRID.fr  Done  Execution Complete 2013-10-26 15:44:31 

Here, we see a list of recent jobs (10 day by default, but you can change this with the --jobDate command line option).

Back to the dsub command. In the previous example, we used /bin/hostname as an argument. The first argument to dsub is the executable, optionally followed by executable's arguments. There are a few things to note about the executable and its arguments:

  • dsub will send executable file within the InputSandbox if it is a relative path (not beginning with /), or if the --ForceExecUpload flag is used. Otherwise, an absolute path indicates that the (eventually) locally present command will be used on the Grid Worker Node.
  • You may want to issue a command with arguments that begin with - or --. Without precaution, dsub will try to parse them as its own arguments. To avoid this, you can place -- alone before the executable's options on the commandline. Example:
$ dsub /bin/hostname -f
Error when parsing command line arguments: option -f not recognized 
  Submit jobs to DIRAC WMS
Usage:
  dsub [option|cfgfile] [executable [--] [arguments...]]
Arguments:
... snip ...
$ 
$ 
$ 
$ dsub /bin/hostname -- -f
6726366
  • in case you don't supply an executable as an argument on the command line and no Executable field can be found in the provided or autogenerated JDL, dsub will read a script content on the standard input:
$ dsub
echo Hello from `/bin/hostname -f`
... finish with <Ctrl+D> ...
6728192

There are many options to the dsub command. Most usefull are probably --JobName and --JobGroup, --Site and --Parametric. As usual, you can check all available options by issuing the command dsub -h.

You can also specify your own JDL file (or even inline) with the --JDL option. Additionnally, if the default generated JDL doesn't fit your needs, you can customize it by placing your own version in your COMDIRAC configuration file profile with the jdl variable. Here is an example where all jobs submitted will require ten hours maximum CPU time by default:

[frangrilles_user]
group_name = frangrilles_user
jdl = [CPUTime = 36000; OutputSandbox = {"std.out","std.err"};]

New in v0r8:

When you submit a job with an interpreted script (shell, Python, etc.), you can insert JDL directives directly inside of your script with specially formatted comment lines. dsub command will parse the lines beginning with #JDL (notice the white space after JDL) and include the followin of the line in the DIRAC JDL generated for the job. For example:

#! /bin/bash

#JDL StdError = std_%s.err
#JDL StdOutput = std_%s.out
#JDL OutputSandbox={"std_%s.out","std_%s.err"}
#JDL Arguments="%s"
#JDL JobName = param_%s
#JDL JobGroup = myparametric
#JDL ParameterStart = 1
#JDL ParameterStep = 1
#JDL Parameters = 2

echo command line: \"$@\"
/bin/hostname
/bin/date

This is especially useful as it allows you to store your job conifugration within the payload script itself.

Retrieving Job Output

Once the job is finished, you can easily get its ouput from DIRAC server:

$ doutput 6726366
# 
$ cat 6726366/std.out 
grid103.lal.in2p3.fr

New in v0r7:

You can alternatively download output from all jobs in a group (JobGroup JDL directive) with the command:

doutput -g <group name>

In this mode, and if all jobs in the group will have output files with different file names, it is generally more convenient to avoid to have one directory by job retrieved. For this, you can add the -n flag to download all output files within the current directory.

Other Job Management

Several additional commands are available:

  • dinput - retrieve InputSandbox (and optionally JDL file) of a given job
  • dlogging - print status history of a given job
  • dkill - delete a job
Clone this wiki locally