diff --git a/imixs-archive-api/README.md b/imixs-archive-api/README.md index ee97e51f..c936500f 100644 --- a/imixs-archive-api/README.md +++ b/imixs-archive-api/README.md @@ -32,45 +32,37 @@ A _snapshot workitem_ is an immutable copy of a workitem (origin-workitem) inclu The snapshot process includes the following stages: -1. A process instance is processed by the Imixs-Workflow engine based on a BPMN 2.0 model. -2. After processing the process instance is persisted into the local workflow storage by the DocumentService. -3. The DocumentService sends a notification of the new or updated process instance to the SnapshotService. -4. The SnapshotService creates a immutable copy of the process instance - called snapshot-workitem. -5. The snapshot workitem is stored into the local workflow storage -6. The origin process instance is returned to the application -7. An external archive system receives the new snapshot-workitem and stores it into a archive storage. +1. A workitem is processed by the Imixs-Workflow engine based on a BPMN 2.0 model. +2. The WorkflowService sends a notification event. +3. The DMS Service collects the DMS meta data. +4. The DMS meta data is stored into the process instance. +5. After processing is completed, the process instance is persisted into the local workflow storage by the DocumentService. +6. The DocumentService sends a notification event. +7. The SnapshotService creates a immutable copy of the process instance - called snapshot-workitem. +8. The SnapshotService detaches the file content form the workitem. +9. The snapshot workitem is stored into the local workflow storage +10. The origin process instance is returned to the application +11. An external archive system polls new snapshot-workitems +12. An external archive system stores the snapshot-workitems into a archive storage. + A snapshot-workitem holds a reference to the origin-workitem by its own $UniqueID which is always the $UniqueID from the origin-workitem suffixed with a timestamp. During the snapshot creation the snapshot $UniqueID is stored into the origin-workitem. -### The SnapshotPlugin +### The DMS Service +The _DMSService_ collects meta data from attached documents during the processing phase. This meta data contains also extracted text information added into the lucene full-text-index. The DMS meta data is stored into the item '_dms_'. -The snapshot process includes the following stages: +The _DMSService_ is parsing the the content of attachments from the type .pdf, .doc, .xls and .ppt. The service uses the libraries of [Apache POI](http://poi.apache.org/) and [Apache PDFBox](https://pdfbox.apache.org/) to extract the content of those documents. -1. create a copy of the origin workitem instance -2. compute a snapshot $uniqueId based on the origin workitem suffixed with a timestamp. -3. change the type of the snapshot-workitem with the prefix 'archive-' -4. If an old snapshot already exists, Files are compared to the current $files and, if necessary, stored in the Snapshot applied -5. remove the file content form the origin-workitem -6. store the snapshot uniqeId into the origin-workitem as a reference ($snapshotID) -7. remove deprecated snapshots - -A snapshot-workitem holds a reference to the origin-workitem by its own $UniqueID which is -always the $UniqueID from the origin-workitem suffixed with a timestamp. -During the snapshot creation the snapshot $UniqueID is stored into the origin-workitem. -Why did we use a Plugin to implement the Snapshot-Architecture? You could say that it is easier to have the snapshot-workitem directly generated by the engine. This avoids that someone can forget to include the plugin in his model. -But this also accounted for every possibility of control when and if data is archived. With the plugin the control is by the model. And this is important when you are considering legal provisions such as the EU data protection law. - - -### How the SnapshotPlugin Works -The SnapshotPlugin implements the ObserverPlugin interface and is tied to the transaction context of the imixs-workflow engine. The process of creating a new snapshot workitem is aware of the current transaction in a transparent way and will automatically role back any snapshots workitems in case of a EJB Exception. The SnapShotPlugin can be included into any model that manages business-critical data. - +### CDI Events +The communication between the service layers is implemented by the CDI Observer pattern. The CDI Events are tied to the transaction context of the imixs-workflow engine. +See the [DocumentService](http://www.imixs.org/doc/engine/documentservice.html#CDI_Events) and [WorkflowService](http://www.imixs.org/doc/engine/workflowservice.html#CDI_Events) for further information. ### The Access Control (ACL) The access to archive data, written into the Imixs-Archive, is controlled completely by the [Imixs-Workflow engine ACL](http://www.imixs.org/doc/engine/acl.html). Imixs-Workflow supports a multiple-level security model, that offers a great space of flexibility while controlling the access to all parts of a workitem. @@ -98,13 +90,6 @@ Thus, in this exmple a system processing 1 million process instances per year ca -# Document Fulltext Search - -The EJB _LuceneDocumentService_ provides method to index the content of attachments of the type .pdf, .doc, .xls and .ppt in a Lucene Fulltext Serach index. The service uses the libraries of [Apache POI](http://poi.apache.org/) and [Apache PDFBox](https://pdfbox.apache.org/) to extract the content of those documents. - -The indexing process is controlled by a timer servcie class called 'LuceneDocumentScheduler'. - - # Deployment diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/dms/FileParserService.java b/imixs-archive-api/src/main/java/org/imixs/archive/dms/FileParserService.java deleted file mode 100644 index 39bb5642..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/dms/FileParserService.java +++ /dev/null @@ -1,187 +0,0 @@ -package org.imixs.archive.dms; - -import java.io.ByteArrayInputStream; -import java.io.IOException; -import java.io.StringWriter; -import java.util.List; -import java.util.logging.Logger; - -import javax.ejb.EJB; -import javax.ejb.Stateless; - -import org.apache.lucene.document.Document; -import org.apache.pdfbox.io.RandomAccessBuffer; -import org.apache.pdfbox.io.RandomAccessRead; -import org.apache.pdfbox.pdfparser.PDFParser; -import org.apache.pdfbox.pdmodel.PDDocument; -import org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException; -import org.apache.pdfbox.text.PDFTextStripper; -import org.apache.poi.hslf.extractor.PowerPointExtractor; -import org.apache.poi.hssf.extractor.ExcelExtractor; -import org.apache.poi.hwpf.HWPFDocument; -import org.apache.poi.hwpf.extractor.WordExtractor; -import org.apache.poi.poifs.filesystem.POIFSFileSystem; -import org.apache.poi.xwpf.extractor.XWPFWordExtractor; -import org.apache.poi.xwpf.usermodel.XWPFDocument; -import org.imixs.workflow.engine.DocumentService; - -/** - * This service component provides a mechanism to index attachments into lucene. - * - * The service parses the content of .pdf, .doc, .xls, .ppt or .docx files. - * - * The service is called by the DMSService - * - * @version 1.0 - * @author rsoika - */ -@Stateless -public class FileParserService { - @EJB - DocumentService documentService; - private PDFTextStripper stripper = null; - private static Logger logger = Logger.getLogger(FileParserService.class.getName()); - - /** - * If the content is a .pdf, .doc, .xls, .ppt or .docx the content will be - * parsed and returned as a string. - * - */ - public String parse(String fileName, List fileData) { - - String result = null; - // ... - String contentType = (String) fileData.get(0); - byte[] data = (byte[]) fileData.get(1); - if (data.length > 0) { - try { - - if (fileName.toLowerCase().endsWith(".pdf")) { - long l = System.currentTimeMillis(); - logger.fine("Lucene - parsing pdf document '" + fileName + "'....."); - - result = parsePDF(data); - - logger.fine("Lucene - parsing completed in " + (System.currentTimeMillis() - l) + "ms"); - - } - - if (fileName.toLowerCase().endsWith(".doc") || fileName.toLowerCase().endsWith(".docx")) { - long l = System.currentTimeMillis(); - logger.fine("Lucene - parsing MS-DOC document '" + fileName + "'....."); - - result = parseMSDOC(data, fileName); - - logger.fine("Lucene - parsing completed in " + (System.currentTimeMillis() - l) + "ms"); - - } - - } catch (IOException e) { - // TODO Auto-generated catch block - e.printStackTrace(); - } - } else { - - } - return result; - } - - /** - * This method parses the text content of a pdf document and adds the contents - * to a lucene document. - * - * @param document - * The document to add the contents to. - * @param is - * The stream to get the contents from. - * @param documentLocation - * The location of the document, used just for debug messages. - * - * @throws IOException - * If there is an error parsing the document. - */ - private String parsePDF(byte[] pdfData) throws IOException { - - RandomAccessRead source = new RandomAccessBuffer(pdfData); - PDFParser parser = new PDFParser(source); - parser.parse(); - PDDocument pdfDocument = parser.getPDDocument(); - - try { - // create a writer where to append the text content. - StringWriter writer = new StringWriter(); - if (stripper == null) { - stripper = new PDFTextStripper(); - } - stripper.writeText(pdfDocument, writer); - - // Note: the buffer to string operation is costless; - // the char array value of the writer buffer and the content string - // is shared as long as the buffer content is not modified, which will - // not occur here. - String contents = writer.getBuffer().toString(); - - logger.info("Länge=" + contents.length()); - logger.info(contents); - - return contents; - - } catch (InvalidPasswordException e) { - // they didn't suppply a password and the default of "" was wrong. - throw new IOException("Error: The document is encrypted and will not be indexed.", e); - } finally { - if (pdfDocument != null) { - pdfDocument.close(); - } - } - } - - /** - * parse ms document.... - * - * - * @param document - * @param pdfData - * @param fileName - * @throws IOException - */ - private String parseMSDOC(byte[] pdfData, String fileName) throws IOException { - - String contents = null; - POIFSFileSystem fs = null; - try { - - if (fileName.endsWith(".xls")) { // if the file is excel file - ExcelExtractor ex = new ExcelExtractor(fs); - contents = ex.getText(); // returns text of the excel file - } else if (fileName.endsWith(".ppt")) { // if the file is power point file - PowerPointExtractor extractor = new PowerPointExtractor(fs); - contents = extractor.getText(); // returns text of the power point file - - } else if (fileName.endsWith(".doc")) { - ByteArrayInputStream source = new ByteArrayInputStream(pdfData); - // else for .doc file - fs = new POIFSFileSystem(source); - HWPFDocument doc = new HWPFDocument(fs); - WordExtractor we = new WordExtractor(doc); - contents = we.getText();// if the extension is .doc - - we.close(); - } else if (fileName.endsWith(".docx")) { - ByteArrayInputStream source = new ByteArrayInputStream(pdfData); - XWPFDocument doc = new XWPFDocument(source); - XWPFWordExtractor we = new XWPFWordExtractor(doc); - contents = we.getText();// if the extension is .doc - - we.close(); - } - } catch (Exception e) { - System.out.println("document file cant be indexed"); - } - logger.info("Länge=" + contents.length()); - logger.info(contents); - - return contents; - } - -} \ No newline at end of file diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/CircuitBreakerPlugin.java b/imixs-archive-api/src/main/java/org/imixs/archive/experimental/CircuitBreakerPlugin.java deleted file mode 100644 index 66975e89..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/CircuitBreakerPlugin.java +++ /dev/null @@ -1,25 +0,0 @@ -package org.imixs.archive.experimental; - -import org.imixs.workflow.ItemCollection; -import org.imixs.workflow.engine.plugins.AbstractPlugin; -import org.imixs.workflow.exceptions.PluginException; -/** - * Plugin class to simulate a roleback scenario. - * @author rsoika - * - */ -public class CircuitBreakerPlugin extends AbstractPlugin { - static final String ERROR = "ERROR"; - - @SuppressWarnings("unused") - @Override - public ItemCollection run(ItemCollection document, ItemCollection event) throws PluginException { - - if (true) { - throw new PluginException(this.getClass().getSimpleName(), ERROR, - "forced plugin exception!"); - } - return document; - } - -} diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/DocFileParser.java b/imixs-archive-api/src/main/java/org/imixs/archive/experimental/DocFileParser.java deleted file mode 100644 index 45198fa3..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/DocFileParser.java +++ /dev/null @@ -1,59 +0,0 @@ -package org.imixs.archive.experimental; - - -import java.io.FileInputStream; - -import org.apache.poi.hslf.extractor.PowerPointExtractor; -import org.apache.poi.hssf.extractor.ExcelExtractor; -import org.apache.poi.hwpf.HWPFDocument; -import org.apache.poi.hwpf.extractor.WordExtractor; -import org.apache.poi.poifs.filesystem.POIFSFileSystem; - -/** - * This class parses the microsoft word files except .docx,.pptx and - * latest MSword files. - * - * @author Mubin Shrestha - */ -public class DocFileParser { - /** - * This method parses the content of the .doc file. - * i.e. this method will return all the text of the file passed to it. - * @param fileName : file name of which you want the content of. - * @return : returns the content of the file - */ - public String DocFileContentParser(String fileName) { - POIFSFileSystem fs = null; - try { - - if (fileName.endsWith(".xls")) { //if the file is excel file - ExcelExtractor ex = new ExcelExtractor(fs); - return ex.getText(); //returns text of the excel file - } else if (fileName.endsWith(".ppt")) { //if the file is power point file - PowerPointExtractor extractor = new PowerPointExtractor(fs); - return extractor.getText(); //returns text of the power point file - - } - - //else for .doc file - fs = new POIFSFileSystem(new FileInputStream(fileName)); - HWPFDocument doc = new HWPFDocument(fs); - WordExtractor we = new WordExtractor(doc); - return we.getText();//if the extension is .doc - } catch (Exception e) { - System.out.println("document file cant be indexed"); - } - return ""; - } - - /** - * Main method. - * @param args - */ - public static void main(String args[]) - { - String filepath = "H:/Filtering.ppt"; - System.out.println(new DocFileParser().DocFileContentParser(filepath)); - - } - } \ No newline at end of file diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/LuceneDocumentSchedulerService.java b/imixs-archive-api/src/main/java/org/imixs/archive/experimental/LuceneDocumentSchedulerService.java deleted file mode 100644 index 38faac9d..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/LuceneDocumentSchedulerService.java +++ /dev/null @@ -1,560 +0,0 @@ -package org.imixs.archive.experimental; - -/******************************************************************************* - * Imixs Workflow - * Copyright (C) 2001, 2011 Imixs Software Solutions GmbH, - * http://www.imixs.com - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version 2 - * of the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You can receive a copy of the GNU General Public - * License at http://www.gnu.org/licenses/gpl.html - * - * Project: - * http://www.imixs.org - * http://java.net/projects/imixs-workflow - * - * Contributors: - * Imixs Software Solutions GmbH - initial API and implementation - * Ralph Soika - Software Developer - *******************************************************************************/ - -import java.text.ParseException; -import java.text.SimpleDateFormat; -import java.util.ArrayList; -import java.util.Calendar; -import java.util.Collection; -import java.util.Date; -import java.util.List; -import java.util.Vector; -import java.util.logging.Level; -import java.util.logging.Logger; - -import javax.annotation.Resource; -import javax.annotation.security.DeclareRoles; -import javax.annotation.security.RunAs; -import javax.ejb.EJB; -import javax.ejb.LocalBean; -import javax.ejb.ScheduleExpression; -import javax.ejb.SessionContext; -import javax.ejb.Stateless; -import javax.ejb.Timeout; -import javax.ejb.Timer; -import javax.ejb.TimerConfig; -import javax.ejb.TransactionAttribute; -import javax.ejb.TransactionAttributeType; - -import org.imixs.archive.lucene.LuceneDocumentService; -import org.imixs.workflow.ItemCollection; -import org.imixs.workflow.WorkflowKernel; -import org.imixs.workflow.engine.DocumentService; -import org.imixs.workflow.exceptions.AccessDeniedException; -import org.imixs.workflow.exceptions.InvalidAccessException; -import org.imixs.workflow.exceptions.ModelException; -import org.imixs.workflow.exceptions.PluginException; -import org.imixs.workflow.exceptions.ProcessingErrorException; -import org.imixs.workflow.exceptions.QueryException; - -/** - * This EJB implements a TimerService which scans snapshot-workitems and - * updates the imixs-archive lucene index. - * - * @see LuceneDocumentService - * - * @author rsoika - * - */ -@Stateless -@LocalBean -@DeclareRoles({ "org.imixs.ACCESSLEVEL.MANAGERACCESS" }) -@RunAs("org.imixs.ACCESSLEVEL.MANAGERACCESS") -public class LuceneDocumentSchedulerService { - - final static public String TYPE_CONFIGURATION = "configuration"; - final static public String NAME = "org.imixs.archive.lucene.scheduler"; - - final static private int MAX_RESULT=100; - - private static Logger logger = Logger.getLogger(LuceneDocumentSchedulerService.class.getName()); - - @EJB - DocumentService documentService; - - @Resource - javax.ejb.TimerService timerService; - - @Resource - SessionContext ctx; - - int iProcessWorkItems = 0; - List unprocessedIDs = null; - - /** - * This method loads the current scheduler configuration. If no configuration - * entity yet exists the method returns an empty ItemCollection. - * - * The method updates the timer details for a running timer. - * - * @return configuration ItemCollection - */ - public ItemCollection loadConfiguration() { - ItemCollection configItemCollection = null; - String searchTerm = "(type:\"" + TYPE_CONFIGURATION + "\" AND txtname:\"" + NAME + "\")"; - - Collection col; - try { - col = documentService.find(searchTerm, 2, 0); - } catch (QueryException e) { - logger.severe("loadConfiguration - invalid param: " + e.getMessage()); - throw new InvalidAccessException(InvalidAccessException.INVALID_ID, e.getMessage(), e); - } - - if (col.size() > 1) { - String message = "loadConfiguration - more than on timer configuration found! Check configuration (type:\"configuration\" txtname:\"org.imixs.workflow.scheduler\") "; - logger.severe(message); - throw new InvalidAccessException(InvalidAccessException.INVALID_ID, message); - } - - if (col.size() == 1) { - logger.fine("loading existing timer configuration..."); - configItemCollection = col.iterator().next(); - } else { - logger.fine("creating new timer configuration..."); - // create default values - configItemCollection = new ItemCollection(); - configItemCollection.replaceItemValue("type", TYPE_CONFIGURATION); - configItemCollection.replaceItemValue("txtname", NAME); - configItemCollection.replaceItemValue(WorkflowKernel.UNIQUEID, WorkflowKernel.generateUniqueID()); - } - configItemCollection = updateTimerDetails(configItemCollection); - return configItemCollection; - } - - /** - * This method saves the timer configuration. The method ensures that the - * following properties are set to default. - * - * - * The method also updates the timer details of a running timer. - * - * @return - * @throws AccessDeniedException - */ - public ItemCollection saveConfiguration(ItemCollection configItemCollection) throws AccessDeniedException { - // update write and read access - configItemCollection.replaceItemValue("type", TYPE_CONFIGURATION); - configItemCollection.replaceItemValue("txtName", NAME); - configItemCollection.replaceItemValue("$writeAccess", "org.imixs.ACCESSLEVEL.MANAGERACCESS"); - configItemCollection.replaceItemValue("$readAccess", "org.imixs.ACCESSLEVEL.MANAGERACCESS"); - - // configItemCollection.replaceItemValue("$writeAccess", ""); - // configItemCollection.replaceItemValue("$readAccess", ""); - - configItemCollection = updateTimerDetails(configItemCollection); - // save entity - configItemCollection = documentService.save(configItemCollection); - - return configItemCollection; - } - - /** - * This Method starts the TimerService. - * - * The Timer can be started based on a Calendar setting stored in the property - * txtConfiguration, or by interval based on the properties datStart, datStop, - * numIntervall. - * - * - * The method loads the configuration entity and evaluates the timer - * configuration. THe $UniqueID of the configuration entity is the id of the - * timer to be controlled. - * - * $uniqueid - String - identifier for the Timer Service. - * - * txtConfiguration - calendarBasedTimer configuration - * - * datstart - Date Object - * - * datstop - Date Object - * - * numInterval - Integer Object (interval in seconds) - * - * - * The method throws an exception if the configuration entity contains invalid - * attributes or values. - * - * After the timer was started the configuration is updated with the latest - * statusmessage - * - * The method returns the current configuration - * - * @throws AccessDeniedException - * @throws ParseException - */ - public ItemCollection start() throws AccessDeniedException, ParseException { - ItemCollection configItemCollection = loadConfiguration(); - Timer timer = null; - if (configItemCollection == null) - return null; - - String id = configItemCollection.getUniqueID(); - - // try to cancel an existing timer for this workflowInstance - while (this.findTimer(id) != null) { - this.findTimer(id).cancel(); - } - - String sConfiguation = configItemCollection.getItemValueString("txtConfiguration"); - - if (!sConfiguation.isEmpty()) { - // New timer will be started on calendar confiugration - timer = createTimerOnCalendar(configItemCollection); - } else { - // update the interval based on hour/minute configuration - int hours = configItemCollection.getItemValueInteger("hours"); - int minutes = configItemCollection.getItemValueInteger("minutes"); - long interval = (hours * 60 + minutes) * 60 * 1000; - configItemCollection.replaceItemValue("numInterval", new Long(interval)); - - timer = createTimerOnInterval(configItemCollection); - } - - // start the timer and set a status message - if (timer != null) { - - Calendar calNow = Calendar.getInstance(); - SimpleDateFormat dateFormatDE = new SimpleDateFormat("dd.MM.yy hh:mm:ss"); - String msg = "started at " + dateFormatDE.format(calNow.getTime()) + " by " - + ctx.getCallerPrincipal().getName(); - configItemCollection.replaceItemValue("statusmessage", msg); - - if (timer.isCalendarTimer()) { - configItemCollection.replaceItemValue("Schedule", timer.getSchedule().toString()); - } else { - configItemCollection.replaceItemValue("Schedule", ""); - - } - logger.info(configItemCollection.getItemValueString("txtName") + " started: " + id); - } - - configItemCollection = saveConfiguration(configItemCollection); - - return configItemCollection; - } - - /** - * Stops a running timer instance. After the timer was canceled the - * configuration will be updated. - * - * @throws AccessDeniedException - * - */ - public ItemCollection stop() throws AccessDeniedException { - ItemCollection configItemCollection = loadConfiguration(); - - String id = configItemCollection.getUniqueID(); - boolean found = false; - while (this.findTimer(id) != null) { - this.findTimer(id).cancel(); - found = true; - } - if (found) { - Calendar calNow = Calendar.getInstance(); - SimpleDateFormat dateFormatDE = new SimpleDateFormat("dd.MM.yy hh:mm:ss"); - String msg = "stopped at " + dateFormatDE.format(calNow.getTime()) + " by " - + ctx.getCallerPrincipal().getName(); - configItemCollection.replaceItemValue("statusmessage", msg); - logger.info(configItemCollection.getItemValueString("txtName") + " stopped: " + id); - } else { - configItemCollection.replaceItemValue("statusmessage", ""); - } - configItemCollection = saveConfiguration(configItemCollection); - - return configItemCollection; - } - - /** - * Returns true if the workflowSchedulerService was started - */ - public boolean isRunning() { - try { - ItemCollection configItemCollection = loadConfiguration(); - if (configItemCollection == null) - return false; - - return (findTimer(configItemCollection.getUniqueID()) != null); - } catch (Exception e) { - e.printStackTrace(); - return false; - } - } - - /** - * This method process scheduled workitems. The method updates the property - * 'datLastRun' - * - * Because of bug: https://java.net/jira/browse/GLASSFISH-20673 we check the - * imixsDayOfWeek - * - * @param timer - * @throws AccessDeniedException - */ - @Timeout - void runTimer(javax.ejb.Timer timer) throws AccessDeniedException { - - ItemCollection configItemCollection = loadConfiguration(); - logger.info(" started...."); - - - - configItemCollection.replaceItemValue("datLastRun", new Date()); - - /* - * Now we process all scheduled worktitems for each model - */ - iProcessWorkItems = 0; - unprocessedIDs = new ArrayList(); - - /** - * find modification since last run - */ - - String query = "SELECT document FROM Document AS document "; - query += " WHERE document.modified > '2017-01-01'"; - query += " ORDER BY document.modified DESC"; - List result = documentService.getDocumentsByQuery(query, MAX_RESULT); - if (result.size() >= 1) { - - } else { - - } - - - - - logger.info("finished successfull"); - - logger.info(iProcessWorkItems + " workitems processed"); - - if (unprocessedIDs.size() > 0) { - logger.warning(unprocessedIDs.size() + " workitems could be processed!"); - for (String aid : unprocessedIDs) { - logger.warning(" " + aid); - } - - } - - Date endDate = configItemCollection.getItemValueDate("datstop"); - String sTimerID = configItemCollection.getItemValueString("$uniqueid"); - - // update statistic of last run - configItemCollection.replaceItemValue("numWorkItemsProcessed", iProcessWorkItems); - configItemCollection.replaceItemValue("numWorkItemsUnprocessed", unprocessedIDs.size()); - - /* - * Check if Timer should be canceled now? - only by interval configuration. In - * case of calenderBasedTimer the timer will stop automatically. - */ - String sConfiguation = configItemCollection.getItemValueString("txtConfiguration"); - - if (sConfiguation.isEmpty()) { - - Calendar calNow = Calendar.getInstance(); - if (endDate != null && calNow.getTime().after(endDate)) { - timer.cancel(); - System.out.println("Timeout - sevice stopped: " + sTimerID); - - SimpleDateFormat dateFormatDE = new SimpleDateFormat("dd.MM.yy hh:mm:ss"); - String msg = "stopped at " + dateFormatDE.format(calNow.getTime()) + " by datstop=" - + dateFormatDE.format(endDate); - configItemCollection.replaceItemValue("statusmessage", msg); - - } - } - - // save configuration - configItemCollection = saveConfiguration(configItemCollection); - - } - - /** - * Create an interval timer whose first expiration occurs at a given point in - * time and whose subsequent expirations occur after a specified interval. - **/ - Timer createTimerOnInterval(ItemCollection configItemCollection) { - // Create an interval timer - Date startDate = configItemCollection.getItemValueDate("datstart"); - Date endDate = configItemCollection.getItemValueDate("datstop"); - long interval = configItemCollection.getItemValueInteger("numInterval"); - - // set default start date? - if (startDate == null) { - // set start date to now - startDate = new Date(); - } - - // check if endDate is before start date, than we do not start the - // timer! - if (endDate != null) { - Calendar calStart = Calendar.getInstance(); - calStart.setTime(startDate); - Calendar calEnd = Calendar.getInstance(); - calEnd.setTime(endDate); - if (calStart.after(calEnd)) { - logger.warning(configItemCollection.getItemValueString("txtName") + " stop-date (" + startDate - + ") is before start-date (" + endDate + "). Timer will not be started!"); - return null; - } - } - Timer timer = null; - // create timer object ($uniqueId) - timer = timerService.createTimer(startDate, interval, configItemCollection.getUniqueID()); - return timer; - - } - - /** - * Create a calendar-based timer based on a input schedule expression. The - * expression will be parsed by this method. - * - * Example: - * second=0 - * minute=0 - * hour=* - * dayOfWeek= - * dayOfMonth=25–Last,1–5 - * month= - * year=* - * - * - * @param sConfiguation - * @return - * @throws ParseException - */ - Timer createTimerOnCalendar(ItemCollection configItemCollection) throws ParseException { - - TimerConfig timerConfig = new TimerConfig(); - - timerConfig.setInfo(configItemCollection.getUniqueID()); - ScheduleExpression scheduerExpression = new ScheduleExpression(); - - @SuppressWarnings("unchecked") - List calendarConfiguation = (List) configItemCollection.getItemValue("txtConfiguration"); - // try to parse the configuration list.... - for (String confgEntry : calendarConfiguation) { - - if (confgEntry.startsWith("second=")) { - scheduerExpression.second(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("minute=")) { - scheduerExpression.minute(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("hour=")) { - scheduerExpression.hour(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("dayOfWeek=")) { - scheduerExpression.dayOfWeek(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("dayOfMonth=")) { - scheduerExpression.dayOfMonth(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("month=")) { - scheduerExpression.month(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("year=")) { - scheduerExpression.year(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - if (confgEntry.startsWith("timezone=")) { - scheduerExpression.timezone(confgEntry.substring(confgEntry.indexOf('=') + 1)); - } - - /* Start date */ - if (confgEntry.startsWith("start=")) { - SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd"); - Date convertedDate = dateFormat.parse(confgEntry.substring(confgEntry.indexOf('=') + 1)); - scheduerExpression.start(convertedDate); - } - - /* End date */ - if (confgEntry.startsWith("end=")) { - SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd"); - Date convertedDate = dateFormat.parse(confgEntry.substring(confgEntry.indexOf('=') + 1)); - scheduerExpression.end(convertedDate); - } - - } - - Timer timer = timerService.createCalendarTimer(scheduerExpression, timerConfig); - - return timer; - - } - - /** - * This method returns a timer for a corresponding id if such a timer object - * exists. - * - * @param id - * @return Timer - * @throws Exception - */ - Timer findTimer(String id) { - Timer timer = null; - for (Object obj : timerService.getTimers()) { - Timer atimer = (javax.ejb.Timer) obj; - String timerID = atimer.getInfo().toString(); - if (id.equals(timerID)) { - if (timer != null) { - logger.severe("more then one timer with id " + id + " was found!"); - } - timer = atimer; - } - } - return timer; - } - - /** - * Update the timer details of a running timer service. The method updates the - * properties netxtTimeout and timeRemaining and store them into the timer - * configuration. - * - * @param configuration - */ - private ItemCollection updateTimerDetails(ItemCollection configuration) { - if (configuration == null) - return configuration; - String id = configuration.getUniqueID(); - Timer timer; - try { - timer = this.findTimer(id); - - if (timer != null) { - // load current timer details - configuration.replaceItemValue("nextTimeout", timer.getNextTimeout()); - configuration.replaceItemValue("timeRemaining", timer.getTimeRemaining()); - } else { - configuration.removeItem("nextTimeout"); - configuration.removeItem("timeRemaining"); - } - } catch (Exception e) { - logger.warning("unable to updateTimerDetails: " + e.getMessage()); - configuration.removeItem("nextTimeout"); - configuration.removeItem("timeRemaining"); - } - return configuration; - } - -} diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/PdfFileParser.java b/imixs-archive-api/src/main/java/org/imixs/archive/experimental/PdfFileParser.java deleted file mode 100644 index e8cf3a50..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/experimental/PdfFileParser.java +++ /dev/null @@ -1,55 +0,0 @@ -package org.imixs.archive.experimental; - -import java.io.File; -import java.io.FileInputStream; -import java.io.FileNotFoundException; -import java.io.IOException; - -import org.apache.pdfbox.cos.COSDocument; -import org.apache.pdfbox.io.RandomAccessRead; -import org.apache.pdfbox.pdfparser.PDFParser; -import org.apache.pdfbox.pdmodel.PDDocument; -import org.apache.pdfbox.text.PDFTextStripper; - -/** - * This class parses the pdf file. - * i.e this class returns the text from the pdf file. - * @author Mubin Shrestha - */ -public class PdfFileParser { - /** - * This method returns the pdf content in text form. - * @param pdffilePath : pdf file path of which you want to parse text - * @return : texts from pdf file - * @throws FileNotFoundException - * @throws IOException - */ - public String PdfFileParser(String pdffilePath) throws FileNotFoundException, IOException - { - String content; - FileInputStream fi = new FileInputStream(new File(pdffilePath)); - PDFParser parser = new PDFParser((RandomAccessRead) fi); - parser.parse(); - COSDocument cd = parser.getDocument(); - PDFTextStripper stripper = new PDFTextStripper(); - content = stripper.getText(new PDDocument(cd)); - cd.close(); - return content; - } - - /** - * Main method. - * @param args - * @throws FileNotFoundException - * @throws IOException - */ - public static void main(String args[]) throws FileNotFoundException, IOException - { - String filepath = "H:/lab.pdf"; - System.out.println(new PdfFileParser().PdfFileParser(filepath)); - - - // doc = LucenePDFDocument.getDocument(file); - - } - } \ No newline at end of file diff --git a/imixs-archive-api/src/main/java/org/imixs/archive/lucene/LuceneDocumentService.java b/imixs-archive-api/src/main/java/org/imixs/archive/lucene/LuceneDocumentService.java deleted file mode 100644 index a656082f..00000000 --- a/imixs-archive-api/src/main/java/org/imixs/archive/lucene/LuceneDocumentService.java +++ /dev/null @@ -1,228 +0,0 @@ -/******************************************************************************* - * Imixs Workflow - * Copyright (C) 2001, 2011 Imixs Software Solutions GmbH, - * http://www.imixs.com - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version 2 - * of the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You can receive a copy of the GNU General Public - * License at http://www.gnu.org/licenses/gpl.html - * - * Project: - * http://www.imixs.org - * http://java.net/projects/imixs-workflow - * - * Contributors: - * Imixs Software Solutions GmbH - initial API and implementation - * Ralph Soika - Software Developer - *******************************************************************************/ - -package org.imixs.archive.lucene; - -import java.io.FileNotFoundException; -import java.io.IOException; -import java.nio.file.Paths; -import java.text.SimpleDateFormat; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.Calendar; -import java.util.Collection; -import java.util.Date; -import java.util.List; -import java.util.Properties; -import java.util.StringTokenizer; -import java.util.logging.Level; -import java.util.logging.Logger; - -import javax.annotation.PostConstruct; -import javax.ejb.EJB; -import javax.ejb.Singleton; - -import org.apache.lucene.analysis.standard.ClassicAnalyzer; -import org.apache.lucene.document.Document; -import org.apache.lucene.document.Field; -import org.apache.lucene.document.Field.Store; -import org.apache.lucene.document.SortedDocValuesField; -import org.apache.lucene.document.StringField; -import org.apache.lucene.document.TextField; -import org.apache.lucene.index.CorruptIndexException; -import org.apache.lucene.index.IndexWriter; -import org.apache.lucene.index.IndexWriterConfig; -import org.apache.lucene.index.Term; -import org.apache.lucene.store.Directory; -import org.apache.lucene.store.FSDirectory; -import org.apache.lucene.store.LockObtainFailedException; -import org.apache.lucene.util.BytesRef; -import org.imixs.workflow.ItemCollection; -import org.imixs.workflow.WorkflowKernel; -import org.imixs.workflow.engine.PropertyService; -import org.imixs.workflow.exceptions.IndexException; -import org.imixs.workflow.exceptions.PluginException; - -/** - * The LuceneDocumentService provides methods to index .doc, .txt and .pdf - * files. - * - * @see http://stackoverflow.com/questions/34880347/why-did-lucene-indexwriter- - * did-not-update-the-index-when-called-from-a-web-modul - * @see LucenePlugin - * @version 1.2 - * @author rsoika - */ -@Singleton -public class LuceneDocumentService { - - protected static final String DEFAULT_ANALYSER = "org.apache.lucene.analysis.standard.ClassicAnalyzer"; - protected static final String DEFAULT_INDEX_DIRECTORY = "imixs-workflow-index"; - protected static final String ANONYMOUS = "ANONYMOUS"; - - private String indexDirectoryPath = null; - private String analyserClass = null; - private Properties properties = null; - - // default field lists - private static List DEFAULT_SEARCH_FIELD_LIST = Arrays.asList("$workflowsummary", "$workflowabstract"); - private static List DEFAULT_NOANALYSE_FIELD_LIST = Arrays.asList("$modelversion", "$processid", - "$workitemid", "$uniqueidref", "type", "$writeaccess", "$modified", "$created", "namcreator", "$creator", - "$editor", "$lasteditor", "$workflowgroup", "$workflowstatus", "txtworkflowgroup", "txtname", "namowner", - "txtworkitemref"); - - @EJB - PropertyService propertyService; - - private static Logger logger = Logger.getLogger(LuceneDocumentService.class.getName()); - - /** - * PostContruct event - The method loads the lucene index properties from the - * imixs.properties file from the classpath. If no properties are defined the - * method terminates. - * - */ - @PostConstruct - void init() { - - // read configuration - properties = propertyService.getProperties(); - indexDirectoryPath = properties.getProperty("lucence.indexDir", DEFAULT_INDEX_DIRECTORY); - // add path sufix - indexDirectoryPath=indexDirectoryPath+"-documents"; - - // luceneLockFactory = properties.getProperty("lucence.lockFactory"); - // get Analyzer Class - - // default=org.apache.lucene.analysis.standard.ClassicAnalyzer - analyserClass = properties.getProperty("lucence.analyzerClass", DEFAULT_ANALYSER); - - } - - /** - * A method to index undindexed files in workitems.. - * - * - * @throws FileNotFoundException - * @throws CorruptIndexException - * @throws IOException - */ - public void updateDocuments(Collection documents) { - try { - long start = System.currentTimeMillis(); - - logger.fine("starting indexing " + documents.size() + " documents..."); - - createIndexWriter(); -// checkFileValidity(); -// closeIndexWriter(); -// long end = System.currentTimeMillis(); -// System.out.println("Total Document Indexed : " + TotalDocumentsIndexed()); -// System.out.println("Total time" + (end - start) / (100 * 60)); - } catch (Exception e) { - System.out.println("Sorry task cannot be completed"); - } - } - - - - - /** - * Returns the Lucene configuration - * - * @return - */ - public ItemCollection getConfiguration() { - ItemCollection config = new ItemCollection(); - - config.replaceItemValue("lucence.indexDir", indexDirectoryPath); - // config.replaceItemValue("lucence.lockFactory", luceneLockFactory); - config.replaceItemValue("lucence.analyzerClass", analyserClass); - - return config; - } - - /** - * This method removes a single Document from the search index. - * - * @param uniqueID - * of the workitem to be removed - * @throws PluginException - */ - public void removeDocument(String uniqueID) { - IndexWriter awriter = null; - long ltime = System.currentTimeMillis(); - try { - awriter = createIndexWriter(); - Term term = new Term("$uniqueid", uniqueID); - awriter.deleteDocuments(term); - } catch (CorruptIndexException e) { - throw new IndexException(IndexException.INVALID_INDEX, - "Unable to remove workitem '" + uniqueID + "' from search index", e); - } catch (LockObtainFailedException e) { - throw new IndexException(IndexException.INVALID_INDEX, - "Unable to remove workitem '" + uniqueID + "' from search index", e); - } catch (IOException e) { - throw new IndexException(IndexException.INVALID_INDEX, - "Unable to remove workitem '" + uniqueID + "' from search index", e); - } finally { - // close writer! - if (awriter != null) { - logger.finest("lucene close IndexWriter..."); - try { - awriter.close(); - } catch (CorruptIndexException e) { - throw new IndexException(IndexException.INVALID_INDEX, "Unable to close lucene IndexWriter: ", e); - } catch (IOException e) { - throw new IndexException(IndexException.INVALID_INDEX, "Unable to close lucene IndexWriter: ", e); - } - } - } - - logger.fine("lucene removeDocument in " + (System.currentTimeMillis() - ltime) + " ms"); - - } - - /** - * This method creates a new instance of a lucene IndexWriter. - * - * The location of the lucene index in the filesystem is read from the - * imixs.properties - * - * @return - * @throws IOException - * @throws Exception - */ - IndexWriter createIndexWriter() throws IOException { - // create a IndexWriter Instance - Directory indexDir = FSDirectory.open(Paths.get(indexDirectoryPath)); - IndexWriterConfig indexWriterConfig; - indexWriterConfig = new IndexWriterConfig(new ClassicAnalyzer()); - - return new IndexWriter(indexDir, indexWriterConfig); - } - -} diff --git a/imixs-archive-api/src/uml/snapshot-service.png b/imixs-archive-api/src/uml/snapshot-service.png index 19783c07..b81c16f7 100644 Binary files a/imixs-archive-api/src/uml/snapshot-service.png and b/imixs-archive-api/src/uml/snapshot-service.png differ diff --git a/imixs-archive-api/src/uml/snapshot-service.uml b/imixs-archive-api/src/uml/snapshot-service.uml index 1f45316e..af0e7aa0 100644 --- a/imixs-archive-api/src/uml/snapshot-service.uml +++ b/imixs-archive-api/src/uml/snapshot-service.uml @@ -1,8 +1,9 @@ @startuml autonumber -box "Imixs-Workflow" #LightBlue +box "Imixs-Archive" #LightBlue participant WorkflowService +participant DMSService participant DocumentService participant SnapshotService end box @@ -12,21 +13,38 @@ participant Archive end box +== Processing Life-Cycle == + activate WorkflowService #EEEEEE -WorkflowService --> WorkflowService : process process instance +WorkflowService --> WorkflowService : processing workitem + +WorkflowService -> DMSService: send notification +activate DMSService +DMSService --> DMSService: collect DMS meta data +DMSService -> WorkflowService +deactivate DMSService + WorkflowService -> DocumentService: save process instance activate DocumentService -DocumentService -> SnapshotService : send notification event +DocumentService -> SnapshotService : send notification activate SnapshotService SnapshotService --> SnapshotService: create snapshot-workitem + +SnapshotService --> DocumentService: detach files + SnapshotService -> DocumentService: save snapshot-workitem deactivate SnapshotService DocumentService -> WorkflowService: return new process instance deactivate DocumentService deactivate WorkflowService -Archive --> SnapshotService : archive snapshot-workitem +== Archive Process == + +Archive --> SnapshotService : poll snapshot +activate Archive +SnapshotService --> Archive : archive snapshot +deactivate Archive @enduml