-
Notifications
You must be signed in to change notification settings - Fork 0
Workflow Composition
##Workflow Composition
In Ducktape we compose experiments by defining collections of modules together with their inputs, and how they are chained together. We use YAML to capture this experiment definition.
The structure of the YAML first describe the Workflow name and description, followed by list of modules. Each modules need to provide its name, source and inputs. Inputs to modules can be either raw YAML values (strings, doubles, ints, bools or lists of these) or references to the outputs of other modules.
A reference is when one module uses as one of its inputs the output of another module.
A sweep occurs when the input to a modules (either by reference or raw) is a list of values, where the input specifies a single value which matches each of the entries in the list. When a sweep is encountered, the execution branches: the module is executed once for each value in the list. If a module contains multiple sweeps, the exeuction is branched for each value of the cartesian product of the individual sweeps.
Any module dependent on the output of a module which has been branched is executed once for each branch. If another sweep is encountered downstream of an existing sweep, a new branch is created for each value of the second sweep.
####Workflow workflow: name: "Affiliation Experiment Test" modules:
- module:
name: RDFDataSet
source: org.data2semantics.exp.modules.RDFDataSetModule
inputs:
filename: "input.rdf"
mimetype: "text/n3"
- module:
name: AffiliationDataSet
source: org.data2semantics.exp.modules.AffiliationDataSetModule
inputs:
dataset:
reference: RDFDataSet.dataset
minSize: 0
[...]
- module:
name: RDFWLSubTreeKernel
source: org.data2semantics.exp.modules.RDFWLSubTreeKernelModule
inputs:
iterations: [0, 2, 4]
depth: [1, 2]
dataset:
reference: RDFDataSet.dataset
instances:
reference: AffiliationDataSet.instances
blacklist:
reference: AffiliationDataSet.blacklist
[...]