From 84f55f1a38cc2efdc024175a2a15031802f9292d Mon Sep 17 00:00:00 2001 From: Ralph Soika Date: Sun, 30 Jul 2017 11:11:00 +0200 Subject: [PATCH] doc --- imixs-archive-hadoop/README.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/imixs-archive-hadoop/README.md b/imixs-archive-hadoop/README.md index abc9a3ad..62e87f5b 100644 --- a/imixs-archive-hadoop/README.md +++ b/imixs-archive-hadoop/README.md @@ -4,6 +4,27 @@ The Imixs-Archive-Hadoop project provides a API to store workitems into a Hadoop Imixs-Archive-Hadoop is communicating with a hadoop cluster via the [WebHDFS Rest API](https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). +## Synchronous Mode Push + +This implementation follows the architector of a synchronous push mode. By this strategy the archive process is directly coupled to the workflow process. This means that the archive process can be controlled by the workflow model. The implementation is realized by a Imixs-Plug-In which is directly controlled by the engine. The plug-in access the hadoop cluster via the Hadoop Rest API. In this scenario the plugin can store archive data, like the Checksum, immediately into the workitem. This is a tightly coupled archive strategy. + +### Pros + +* The archive process can be directly controlled by the workflow engine (via a plug-in) +* The data between hadoop and imixs-workflow is synchron at any time +* A workitem can store archive information in synchronous way (e.g. checksumm) + +### Cons + +* The process is time consuming and slows down the overall performance from the workflow engine +* The process is memory consuming +* The process have to be embedded into the running transaction which increases the complexity +* Hadoop must be accessible via the internet and additional security must be implemented on both sides. + + +# Implementation + +The service is implemented a a stateful session EJB with a Plug-In. The statefull session EJB synchronizes the transaction and decided in the afterCommit(boolean) method either to comit or rolback the changes in hadoop. This approach is a little bit complex, time and memory consuming but has the advantage that the workitem is always synchron with the data in the hadoop cluster. ## CDI Support