From 8e553ea23821a20a3be5d10b35cf31c8e19b2294 Mon Sep 17 00:00:00 2001 From: Richard Eckart de Castilho Date: Sat, 15 Aug 2015 16:29:28 +0200 Subject: [PATCH] No issue. Migrating wiki documentation to asciidoc. --- dkpro-lab-doc/pom.xml | 99 ++++++++++++ .../src/main/asciidoc/developer-guide.adoc | 31 ++++ .../developer-guide/DeveloperSetup.adoc | 19 +++ .../developer-guide/GuideToUsingWithGit.adoc | 73 +++++++++ .../developer-guide/ReleaseGuide.adoc | 27 ++++ .../src/main/asciidoc/user-guide.adoc | 27 ++++ .../asciidoc/user-guide/CrossValidation.adoc | 144 ++++++++++++++++++ .../asciidoc/user-guide/TaskLifecycle.adoc | 97 ++++++++++++ pom.xml | 1 + 9 files changed, 518 insertions(+) create mode 100644 dkpro-lab-doc/pom.xml create mode 100644 dkpro-lab-doc/src/main/asciidoc/developer-guide.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/developer-guide/DeveloperSetup.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/developer-guide/GuideToUsingWithGit.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/developer-guide/ReleaseGuide.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/user-guide.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/user-guide/CrossValidation.adoc create mode 100644 dkpro-lab-doc/src/main/asciidoc/user-guide/TaskLifecycle.adoc diff --git a/dkpro-lab-doc/pom.xml b/dkpro-lab-doc/pom.xml new file mode 100644 index 0000000..11adf5a --- /dev/null +++ b/dkpro-lab-doc/pom.xml @@ -0,0 +1,99 @@ + + + 4.0.0 + + de.tudarmstadt.ukp.dkpro.lab + dkpro-lab + 0.12.0-SNAPSHOT + + dkpro-lab-doc + pom + DKPro Lab - Documentation + + + + org.asciidoctor + asciidoctor-maven-plugin + + + user-guide-html + generate-resources + + process-asciidoc + + + html5 + coderay + user-guide.adoc + ./user-guide/images + + left + ./user-guide/ + + + + + developer-guide-html + generate-resources + + process-asciidoc + + + html5 + coderay + developer-guide.adoc + ./developer-guide/images + + left + ./developer-guide/ + + + + + + + + + + org.asciidoctor + asciidoctor-maven-plugin + 1.5.2.1 + + + 8 + true + true + ${project.version} + ${project.version} + font + + + + + org.asciidoctor + asciidoctorj-pdf + 1.5.0-alpha.9 + + + + + + + \ No newline at end of file diff --git a/dkpro-lab-doc/src/main/asciidoc/developer-guide.adoc b/dkpro-lab-doc/src/main/asciidoc/developer-guide.adoc new file mode 100644 index 0000000..8068f46 --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/developer-guide.adoc @@ -0,0 +1,31 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + += DKPro Lab™ Developer Guide +:Author: The DKPro Lab Team +:toc-title: Developer Guide + +This document targets developers of DKPro Lab components. + +include::{include-dir}DeveloperSetup.adoc[] + +<<< + +include::{include-dir}ReleaseGuide.adoc[] + +<<< + +include::{include-dir}GuideToUsingWithGit.adoc[] diff --git a/dkpro-lab-doc/src/main/asciidoc/developer-guide/DeveloperSetup.adoc b/dkpro-lab-doc/src/main/asciidoc/developer-guide/DeveloperSetup.adoc new file mode 100644 index 0000000..e080740 --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/developer-guide/DeveloperSetup.adoc @@ -0,0 +1,19 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +## Developer Setup + +Nothing notable right now. diff --git a/dkpro-lab-doc/src/main/asciidoc/developer-guide/GuideToUsingWithGit.adoc b/dkpro-lab-doc/src/main/asciidoc/developer-guide/GuideToUsingWithGit.adoc new file mode 100644 index 0000000..27d007f --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/developer-guide/GuideToUsingWithGit.adoc @@ -0,0 +1,73 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +## Git and Eclipse. + +This is an introductory guide to using DKPro Lab with Git and Eclipse 4.3.1. + +### One time preparation + +We recommend installing `m2e-egit`, an Eclipse plug-in which add the option "Import Maven Projects..." to the context menu in the git repository view. + + # In Eclipse, go to `Window` -> `Preferences` -> `Maven` -> `Discovery`. Open `Catalog`. Search for "m2e-egit". Select it and Finish. + # Accept everything as default, agree to license, click Ok. Keep installing unsigned content. + # Restart Eclipse when directed. + +### Create your local clone + +First, you need to create a clone of the remote repository on your local machine. + + # Open Eclipse Git Repository Perspective. Click "clone a git repository". + # In a browser, go to the DKPro Lab Google Code page, then `Source`. Copy the "git clone" address, and paste it into the Eclipse `Clone Location URI`. Other fields should auto-fill. You will also need to get your Google Code password (from the GC DKPro Lab page, if you are signed in), and enter it here. Username does not need to be your entire gmail address, which is different from when you commit svn to GC. Use all other default options. + # Check out both "develop" and "master". Click `Next` + # Change `Initial branch` to `develop`. This is the branch where your commits will go. Click `Finish`. + +Now, DKPro Lab should be listed in your Eclipse Git Repository. You have made a local clone and have also checked out a branch to work on. The next step is to make the java side of Eclipse aware of the local clone and checked-out branch's existence. + +Then you make a Maven copy for your Package Explorer. + + # In Eclipse's Git Repository Perspective, open DKPro Lab, open `Working Directory`, right-click `de.tudarmstadt.ukp.dkpro.lab`, "Import Maven Projects..." + # Optionally, add it to the working set of your choice, then click Finish. + +Congradulations! You are all set to begin developing DKPro lab. + +### Update your project + + # Go to Git Repository Perspective, right-click, "pull." This is just like svn update. Now your local clone and your checked out branch are both updated and you are all set. + +### Commit your work + +When you are ready to merge your contributions with the main project, you can either commit entire files at once, or sets of changes from those files. + +#### To commit entire files + + # Right-click on the Package Explorer files with your changes -> `Team` -> `addToIndex`. Then the snowflake icon appears. + # Right-click on files with the changes -> `Team` -> `Commit`. Add a commit message and click on the files you want to include. Then, commit and push. This is just like svn commit. + +#### To commit individual changes + # Go to Git Repository Perspective, then on the bottom of the screen go to `Git Staging`. Click on your file. It opens in a compare view. + # In between the 2 versions, click the arrow button for the changes you want to commit. The changes are placed in the `index view`. + # Save this editor (Ctrl+s). + # Add a commit message, then commit and push. + +#### Committing with remote changes + +When you want to commit but someone else has committed since you last updated, you must stash your changes, update your local clone, and then replace your copy with the stash. Git will not allow you to push your changes to the remote repository otherwise. + + # Go to Git Repository Perspective. Right-click on DKPro Lab -> "stash changes". Call it "temp" or whatever. Now, our local changes on the checked-out branch are gone, and saved in the Git Repository, in DKPro Lab, in "Stashed Commits." + # Right-click on project -> "pull." + # Right-click on your stashed commit("temp") -> "apply stashed changes." Then our changes are back on our checked-out branch. Delete the stashed copy. + # Now you are ready to commit. \ No newline at end of file diff --git a/dkpro-lab-doc/src/main/asciidoc/developer-guide/ReleaseGuide.adoc b/dkpro-lab-doc/src/main/asciidoc/developer-guide/ReleaseGuide.adoc new file mode 100644 index 0000000..01b1c36 --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/developer-guide/ReleaseGuide.adoc @@ -0,0 +1,27 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +## Release guide + + * Merge changes from development branch into master + * Make sure you are on the master branch + * For good measure, also change the system language to English on OS X in the System preferences and restart the terminal session + * `$ LANG='en_US.UTF-8'` -- switch to English to work around [http://jira.codehaus.org/browse/MRELEASE-812 MRELEASE-812] + * `$ mvn release:prepare -DautoVersionSubmodules=true` -- prepare the release + * `$ mvn release:perform` -- perform the release + * `$ cd target/checkout/de.tudarmstadt.ukp.dkpro.lab` + * `$ mvn javadoc:aggregate` + * Check out the *gh-pages* branch and place the new Javadoc under the appropriate release folder. \ No newline at end of file diff --git a/dkpro-lab-doc/src/main/asciidoc/user-guide.adoc b/dkpro-lab-doc/src/main/asciidoc/user-guide.adoc new file mode 100644 index 0000000..7c77767 --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/user-guide.adoc @@ -0,0 +1,27 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + += DKPro Lab™ User Guide and Reference +:Author: The DKPro Lab Team +:toc-title: User Guide + +This document targets users of DKPro Lab. + +include::{include-dir}TaskLifecycle.adoc[] + +<<< + +include::{include-dir}CrossValidation.adoc[] diff --git a/dkpro-lab-doc/src/main/asciidoc/user-guide/CrossValidation.adoc b/dkpro-lab-doc/src/main/asciidoc/user-guide/CrossValidation.adoc new file mode 100644 index 0000000..f2f095e --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/user-guide/CrossValidation.adoc @@ -0,0 +1,144 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +## Cross validation + +### Cross validation tutorial + +This is a brief introduction to running an experiment that has a cross validation part. The most important things to learn in this example is, how to set a parameter space dimension which is created dynamically via a nested `BatchTask` and how to use a `FoldDimensionBundle`. + +#### Basic tasks + +Assume you have these tasks: + + * `preprocessingTask`: this task does some basic preprocessing of your data. The data produced by this task is the data that you want to run your cross validation experiment on. So in several iterations, some part of this data is used to train a classifier (_trainingSet_), while the rest of the data is used to evaluate the quality of the classifier (_testSet_). + * `featureExtractionTask`: this task extracts the features from the to train the classifier from the preprocessed data of the _trainingSet_. Separating this task from the `preprocessingTask` allows you to have run your experiment with different feature sets but while only doing the preprocessing once. + * `trainingTask`: this task trains a classifier from the extracted features. + * `validationTask`: this task finally evaluates the classifier on _testSet_. + * `batchTask`: this tasks runs all the four task above. + +#### Cross-validation dimension + +With such a setup, the data which is the input for the cross validation is unknown, until the `preprocessingTask` has completed. Only when it is done, the n-fold cross validation can be set up during which part of the data is assigned to the _trainingSet_ and _testSet_. + +In the conceptualization of the Lab, a the values assigned to the _trainingSet_ and _evaluationSet_ in each fold correspond to a parameter space dimension. The Lab provides a special `FoldDimensionBundle` to create a n-fold assignment to these sets. An illustration + +[source,java] +---- +Dimension data = Dimension.create("data", "1", "2", "3", "4", "5", "6"); +FoldDimensionBundle foldBundle = new FoldDimensionBundle("fold", data, 3); +---- + +The _foldBundle_ dimension produces three assignments for two parameters: `fold_training` (the _trainingSet_) and `fold_validation` (the _testSet_ - sorry, this should probably be called `fold_test` ...): + +[cols="3*", options="header"] +|==== +| Assignment | fold_training | fold_validation +| 1 +| 2, 3, 5, 6 +| 1, 4 + +| 2 +| 1, 3, 4, 6 +| 2, 5 + +| 3 +| 1, 2, 4, 5 +| 3, 6 +|==== + +The problem in this illustration is, that the `foldBundle` dimension is created statically from the `data` dimension, but for our cross validation experiment, we need to create it from the data that was created by the `preprocessingTask`. So how can we create the `foldBundle` dynamically? + +To solve this problem, we introduce another batch task and change the batch task we already have: + * `crossValidationTask`: this task runs the `featureExtractionTask`, the `trainingTask` and the `validationTask` + * `batchTask` (modified): this tasks now runs the `preprocessingTask` and the `crossValidationTask`. + +#### Cross-validation task + +Now, instead of statically setting up the parameter space for the `crossValidationTask`, we set it up dynamically by overriding the `execute()` method. In the following example, we assume that the `preprocessingTask` has produced some `XMI` data, which is the data on which we want to perform the cross-validation. The `featureExtractionTask` requires two parameters, `filesRoot` (the base directory of the data) and `files_training` (the data in the _trainingSet_). The `validationTask`also requires `filesRoot` and it needs `files_validation` (the _testSet_ ) to evaluate the classifier. + +[source,java,numbered] +---- +BatchTask crossValidationTask = new BatchTask() { + public void execute(TaskContext aContext) throws Exception { + File xmiPathRoot = aContext.getStorageLocation("XMI", AccessMode.READONLY); + Collection files = FileUtils.listFiles(xmiPathRoot, new String[] { "xmi" }, false); + String[] fileNames = new String[files.size()]; + int i = 0; + for (File f : files) { fileNames[i] = f.getName(); i++; } + Arrays.sort(fileNames); + FoldDimensionBundle foldDim = new FoldDimensionBundle( + "files", Dimension.create("", fileNames), 10); + Dimension filesRootDim = Dimension.create("filesRoot", xmiPathRoot); + + ParameterSpace pSpace = new ParameterSpace(foldDim, filesRootDim); + setParameterSpace(pSpace); + + super.execute(aContext); + } +}; +crossValidationTask.addImportLatest("XMI", "XMI", preprocessTask.getType()); +crossValidationTask.addTask(featureExtractionTask); +crossValidationTask.addTask(trainingTask); +crossValidationTask.addTask(validationTask); + +BatchTask batchTask = new BatchTask(); +batchTask.setParameterSpace(pSpace); +batchTask.addTask(preprocessTask); +batchTask.addTask(crossValidationTask); +---- + + * *line 3-8*: create a sorted list of the file names which the `preprocessingTask` has stored under the key `XMI`. + * *line 9-10*: create a 10-fold `FoldDimensionBundle` called `files` which will set the parameters `files_training` and `files_validation`. + * *line 11*: create a constant dimension which sets the parameter `files_root`. + * *line 13-14*: create the parameter space for the `crossValidationTask` for the parameters `files_training`, `files_validation` and _files_root_. Since we later add the `crossValidationTask` to the `BatchTask`, the `crossValidationTask` also inherits all parameters from the parameter space of the `BatchTask`. + * *line 16*: finally, run the cross validation + * *line 19*: import the `XMI` key from the `PreprocessingTask` into the `CrossValidationTask`. + * *line 20-22*: add the sub-tasks that perform the cross-validation. + * *line 24-27*: configure the overall batchTask that runs the preprocessing and the cross-validation. + +#### Reader + +The following snipped illustrates how to use the _trainingSet_ parameter `files_training_` and `filesRoot` to configure the DKPro Core ASL `XmiReader` component. + +[source,java,numbered] +---- +Task featureExtractionTask = new UimaTaskBase() { +@Discriminator private File filesRoot; +@Discriminator private Collection files_training; + +public CollectionReaderDescription getCollectionReaderDescription(TaskContext aContext) + throws ResourceInitializationException, IOException { + Collection patterns = new ArrayList(); + for (String f : files_training) { + patterns.add(XmiReader.INCLUDE_PREFIX+f); + } + + return createReader(XmiReader.class, + XmiReader.PARAM_PATH, filesRoot, + XmiReader.PARAM_PATTERNS, patterns); + } +/** ... getAnalysisEngineDescription() omitted ... */ +}; +---- + +#### Caveat + +Mind, that it is currently not tested to import data across batch task boundaries. That is, in the example above, the `featureExtractionTask` does not directly import data from the `preprocessingTask`. Instead, the `crossvalidationTask` imports the data from the `preprocessingTask` and forwards it to the `featureExtractionTask` via the file names in the fold dimension. + +#### Summary + +If a cross-validation task depends on the output of a preprocessing task, it is impossible to set up a static parameter dimension for the _trainingSet_ and _testSet_, because it depends on the data created by the preprocessing task. The tutorial has illustrated how to create a nested batch task which dynamically creates its own parameter space using a `FoldDimensionBundle` based on the output of the preprocessing task. \ No newline at end of file diff --git a/dkpro-lab-doc/src/main/asciidoc/user-guide/TaskLifecycle.adoc b/dkpro-lab-doc/src/main/asciidoc/user-guide/TaskLifecycle.adoc new file mode 100644 index 0000000..f1fe459 --- /dev/null +++ b/dkpro-lab-doc/src/main/asciidoc/user-guide/TaskLifecycle.adoc @@ -0,0 +1,97 @@ +// Copyright 2015 +// Ubiquitous Knowledge Processing (UKP) Lab +// Technische Universität Darmstadt +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +## Task Lifecycle and Configuration + +### Lifecycle basics + +In DKPro Lab, a task instantiated by the user. This allows it to pre-configure the task before it is handed over to the framework. While still under the control of the user, the task usually passes through the following stages: + + * *instantiation:* the task is instantiated by the user + * *configuration:* the task is configured by the user + * *run:* the user invokes `Lab.run()`. At this point, the control over the task is handed over to the framework. + +#### Primitive task lifecycle + +A primitive task, i.e. one that is not running within a batch task, then passes through the following stages: + + * *engine lookup:* the framework locates the engine responsible for executing the task. + * *context setup:* the execution engine creates a new task context. The framework tries to resolve data dependencies at this point - if the dependencies cannot be resolved, this fails. + * *engine setup:* the engine performs preparative steps necessary before the task can run + * *lifecycle event "initialize":* signal that the task has been fully initialized. The default lifecycle manager stores the values of the discriminators and task properties in the task context at this point. + * *lifecycle event "begin":* signal that the task execution is about to begin. The default lifecycle manager enforces several JVM garbage collection runs to free up memory and records the start time. + * *engine execution* the engine performs the actual task + * *lifecycle event "complete":* if the task completed successfully. The default lifecycle manager records the end time of the task, runs all reports registered on the task and stores the task metadata in the context to record that the task has completed. + * *lifecycle event "fail"*: if the task failed. The default lifecycle manager deletes the task context and logs a failure message. + +#### Batch task and subtask lifecycle + +The framework was built for parameter sweeping experiments, so most setup involve one or more batch tasks. The batch task itself is a primitive task which passes through the stages outlined above. Subtasks running within the batch task, however, pass through additional stages which are executed for every location in the parameter space: + + * *configuration:* all tasks in the batch are configured with the parameter at the current location in the parameter space + * *check for previous execution:* usually only a subset of the parameters in the parameter space apply to a particular task. Consequently, the output produced by a task for a particular parameter configuration is valid for every position in the parameter space in which the parameters applicable to the task are constant. If the task has already been executed for a particular parameter combination, it does not need to be executed again. + * *context setup:* if the data-dependencies cannot be resolved, the task execution is deferred and the next task in the batch is tried. (cf. _context setup_ in primitive task lifecycle) + * *loop detection:* if all task executions fail due to unresolvable data-depndencies, the batch task is aborted. + * *subtask execution:* primitive task lifecycle for the subtask is executed (see above, except _context setup_) + * *scope extension:* the task execution is added to the batch task scope. + +### Static configuration + +Since a task is instantiated by the user, it is possible to pre-configure the task before it is handed over to the framework. In a typical scenario, data dependencies are pre-configured by the user while all other parameters are managed via the parameter space. +All invariant parameters and data-dependencies can be configured at this point. Some users prefer this over maintaining even invariant parameters in the parameter space. + +### Dynamic data-dependencies + +The framework resolves data-dependencies only after the task has been configured. This allows the task to configure data-dependencies dynamically, depending on parameters. To configure a data-dependency based on a parameter X, implement a setter for X in the task and use the `addImportLatest()` method in the setter to configure the dependency, e.g.: + +[source,java] +---- +@Discriminator +String preprocessingTaskType; + +void setPreprocessingTask(String aPreprocessingTaskType) { + preprocessingTaskType = aPreprocessingTaskType; + addImportLatest("KEY", "KEY", preprocessingTaskType); +} +---- + +If a certain data-dependency is not required, it is important that the dependency is explictly removed. This easiers way to do this in a setter is: + +[source,java] +---- +getImports().remove("KEY"); +---- + +### Dynamic workflows + +The framework allows completely dynamic workflows, because it creates no global execution plan. Not only is it possible to configure data-dependencies based on parameter configurations, but is is possible to dynamically create the whole set of tasks including additional parameters that should be executed for a particular parameter configuration. To archive this, create a custom batch task class and override the `execute()` method of the batch task. In your custom execute() method, you can use methods such as `setTasks()` and `setParameterSpace()` to configure the tasks that need to be executed and their parameters. Your custom batch task can have discriminator fields as any other task can have them. Be aware that the parameter configurations of nested batch tasks are additive. Assume an outer batch task O with a parameter space `{A: 1}` and an inner batch task I with a parameter space `{B: 1}`, a task T running within the inner batch task is configured with `{A: 1, B: 1}`. + +An example for a dynamically created parameter space is given in the [CrossValidationTutorial cross-validation tutorial]. + +#### Data-dependencies with nested batch tasks + +Assume an outer batch task O. Within O runs a primitive task T1 and an inner batch task I. Within I runs a second primitive task T2. It should be possible that T2 declares a data-dependency on data produced by T1 because the inner batch task I inherits the scope the outer task O. + +However, it is not possible that a task T3 running within the outer batch task O declares a data-dependency on T2. While T3 lives in the parameter space of O, T2 lives in the combined parameter space of O and I. Thus, T2 is potentially executed more than once when T3 is executed only once. Since a data-dependency must be uniquely resolvable to a particular task context, and for one execution of T3 there can be several applicable executions of T2, a direct data-dependency is not possible. It is possible, though, to implement and register a report on I which aggregates the data of all executions of T2 and stores it in the task context of I. T3 can then declare a dependency on I to fetch the aggregated data. + +### Dynamic reports + +Reports can be dynamically configured as necessary either by implementing a setter or as part of setting up a dynamic workflow. If set up using a setter, reports need to be explicitly removed from a task if they are not needed. The easiest way to do this in a setter is: + +[source,java] +---- +getReports().removeReport(ReportClass.class) +---- \ No newline at end of file diff --git a/pom.xml b/pom.xml index c0b6d27..d2f8112 100644 --- a/pom.xml +++ b/pom.xml @@ -61,6 +61,7 @@ dkpro-lab-uima-engine-simple dkpro-lab-uima-engine-cpe + dkpro-lab-doc