A Clojure Frontend to Drake

Overview

The clj-frontend namespace provides the following functions for creating and executing Drake workflows in Clojure.

new-workflow -- Create a new workflow.
cmd-step -- Add a step with commands to a workflow.
method -- Add a method to a workflow.
method-step -- Add a step using a method to a workflow.
template -- Add a template to a workflow.
template-step -- Add a step using a template to a workflow.
set-var -- Set a variable in a workflow.
base -- Change the value of the "BASE" variable in a workflow.
run-workflow -- Run a workflow.

With the exception of new-workflow all these functions accept a workflow as their first argument and return a modified workflow. This API structure was inspired by honeysql. Let's see how this works in practice by translating a trivial drake workflow.

Minimal Example

Add Drake to your project

Your project.clj dependencies should include the latest Drake library, e.g.:

[factual/drake "0.1.6"]

Workflow in Drake

out <-
  echo "We are writing to a file here" > $OUTPUT

Workflow Translated into Clojure

;; Bring all the clj-frontend functions into the current namespace.
(use 'drake.clj-frontend)

;; Define a workflow called minimal-workflow.
(def minimal-workflow
  (->
   (new-workflow)                       ;Create a new workflow
   (cmd-step                            ;Add a command step with the
                                        ;following arguments
    ["out"]                             ;Array of outputs
    []                                  ;Array of inputs
    ["echo \"We are writing to a file here\" > $OUTPUT"] ;Array of commands
    )))

What is happening here is that (new-workflow) runs the new-workflow function to create a new workflow. Then the -> macro passes this new workflow into the cmd-step function as its first argument. The subsequent arguments to cmd-step are arrays of outputs, inputs and commands. Just like the original drake workflow, outputs come before inputs.

With our workflow in hand, if we are working at a repl, we can preview our workflow to see what would happen if we actually ran it.

(run-workflow minimal-workflow :preview true) should generate the following preview.

The following steps will be run, in order:
  1: out <-  [missing output]

If we are satisfied, we can actually run the workflow with the following command.

(run-workflow minimal-workflow)

This will run our workflow and generate an output at the repl similar to the following.

Workflow Started @ 16:35:58

1: out <-  [no-input step] Step Started @ 16:35:58
1: out <-  [no-input step] Step Finished @ 16:35:58

Workflow Finished @ 16:35:58

Practicalities

The most straight forward way to actually use clj-frontend is to create a new project with lein new and add drake as a dependency to the project file using something similar to this Clojars coordinate [factual/drake "0.1.6"]. Be sure to check Clojars for the most current coordinate. Then write your workflow in an appropriate namespace in a file in the src directory. By default, the inputs and outputs of your workflow will then end up in the root directory of the leiningen project. drake/demos/clj-frontend is a leiningen project demonstrating this approach. lein repl inside drake/demos/clj-frontend will let you interact with the code examples from this page which can be found in the clj-frontend.demo namespace contained in drake/demos/clj-frontend/src/clj_frontend/demo.clj.

Alternately you could avoid making a leiningen project by using lein-exec to create a stand alone clj script. The downside to a stand alone lein-exec script is that lein-exec won't currently allow you to open a repl from the command line. If however you open the script in emacs you can actually "cider-jack-in" to get a working repl even though the script is not part of a leiningen project and has no associated project.clj. Opening a lein-exec script like this with emacs and then jacking into a repl is a really nice way to run drake.clj-frontend.

Full Featured Example

This example features variables, methods, variable substitution and multiline commands.

Workflow in Drake

out1, out2 <- [-timecheck]
  echo "This is the first output." > $OUTPUT0
  echo "This is the second output." > $OUTPUT1

test_method()
  echo "Here we are using a method." > OUTPUT

out_method <- [method:test_method]

test_var=TEST_VAR_VALUE
output_three=out3

$[output_three] <- out1
  echo "This is the third output." > $OUTPUT
  echo "test_var is set to $test_var -- $[test_var]." >> $OUTPUT
  echo "The file $INPUT contains:" | cat - $INPUT >> $[OUTPUT]

Workflow in Clojure

(def advanced-workflow
  (->
   (new-workflow)
   (cmd-step
    ["out1"
     "out2"]
    []
    ["echo \"This is the first output.\" > $OUTPUT0"
     "echo \"This is the second output.\" > $OUTPUT1"] ;multiple commands
    :timecheck false)                   ;options are key value pairs
   (method
    "test_method"
    ["echo \"Here we are using a method.\" > $OUTPUT"])
   (method-step
    ["out_method"]                      ;outputs
    []                                  ;inputs
    "test_method")                      ;method name
   (set-var "test_var" "TEST_VAR_VALUE") ;var name, var value
   (set-var "output_three" "out3")
   (cmd-step
    ["$[output_three]"]                 ;inputs and outputs can have
                                        ;$[XXX] substitution
    ["out1"]
    ;; $[XXX] substitution is allowed in commands.
    ["echo \"This is the third output.\" > $OUTPUT"
     "echo \"test_var is set to $test_var - $[test_var].\" >> $OUTPUT"
     "echo \"The file $INPUT contains:\" | cat - $INPUT >> $[OUTPUT]"])))

(run-workflow advanced-workflow :preview true)

(run-workflow advanced-workflow)

Functional Programming with Workflows

clj-frontend really gets powerful when you write functions that take and return workflows and then use reduce to generate a workflow based on a collection. Here is a simple example.

Let's say you want to take several raw data sources from the internet and for each source you want to create a directory, download some data into it, and do several processing steps on the data. We will express this as a map called dir->url-map between the directory names we want to create and the raw data sources we want to process.

(def dir->url-map
  "Hash map of:
  Directory Names => URLs"
  {"Dir1" "http://url1"
   "Dir2" "http://url2"
   "Dir3" "http://url3"})

Now we need a function that takes an existing workflow and adds new steps to it for each directory => url pair from our data-map.

(defn download-and-process
  "I take an existing workflow and add steps to download and process the data at
  url into the directory dir"
  [w-flow [dir url]]                    ;note the argument
                                        ;destructuring
  (-> w-flow
      (base "")                         ;make sure we are in top
                                        ;directory
      (cmd-step
       [dir]
       []
       ["mkdir -p $OUTPUT"])
      (base dir)                        ;move into dir for our
                                        ;subsequent commands
      (cmd-step
       ["raw_data"]
       []
       ["wget -O $OUTPUT "  url]        ;get the data
       :timecheck false)
      (cmd-step
       ["sorted_data"]
       ["raw_data"]
       ["sort -o $OUTPUT"])            ;sort the data
      ;; more steps can be added here
      ))

Finally we can use reduce with download-and-process to add several workflow steps for each dir => url pair in dir->url-map.

(def reduce-workflow
  (reduce
   download-and-process
   (new-workflow)
   dir->url-map))

(run-workflow reduce-workflow :preview true) should now give you the following preview:

The following steps will be run, in order:
  1: Dir1 <-  [missing output]
  2: Dir1/raw_data <-  [missing output]
  3: Dir1/sorted_data <- Dir1/raw_data [projected timestamped]
  4: Dir2 <-  [missing output]
  5: Dir2/raw_data <-  [missing output]
  6: Dir2/sorted_data <- Dir2/raw_data [projected timestamped]
  7: Dir3 <-  [missing output]
  8: Dir3/raw_data <-  [missing output]
  9: Dir3/sorted_data <- Dir3/raw_data [projected timestamped]

Since the data for this workflow is fake, we can't actually run it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Clojure Frontend to Drake

Overview

Minimal Example

Add Drake to your project

Workflow in Drake

Workflow Translated into Clojure

Practicalities

Full Featured Example

Workflow in Drake

Workflow in Clojure

Functional Programming with Workflows

Clone this wiki locally