Skip to content
wongiseng edited this page Oct 9, 2014 · 2 revisions

We have cluster execution environment profile which allows user to scale their experiment from local machines up to clusters or even using resources in the cloud. To do this we leverage PJ2 library from Rochester Institute of Technology. Currently with this library only Java modules can be executed remotely.

###Cluster Environment Execution

We achieve scalable execution by encapsulating Ducktape workflow as PJ2 Jobs and each of individual modules as PJ2 Tasks. Jobs and Tasks are distributed by PJ framework, and the task distribution is mainly done by PJ2 tracker node. Configuration of these tracker and backend node can be seen in the following figure:

For the cluster environment execution profile, tasks communicates via PJ2 Tuples. Within Ducktape we have ModuleTuples which contains data needed to execute module as PJ2 Tasks. Workflow job communicates with each of the distributed remote module through these tuples.

###Configuration of tracker and launchers

For more detail, checkout the cluster installation section. Before running scalable experiments, we need to configure tracker (which will be running on frontend node of a cluster) and launchers for each nodes that will be used to run jobs/tasks. Detailed configuration can be seen in PJ2 documentation.

###Workflow execution Once tracker and launcers are set up, we can submit our ducktape workflow, on the same machine as the tracker, by using the cluster environment :

java -jar ducktape-run.jar --profile CLUSTER --tracker [tracker host:port] --workflowjar [modules.jar]

You can supply jar files required to run workflow in modules.jar