Refactor template parser #3

emeka · 2020-03-02T09:18:57Z

Finally the PR for the document service. It is a big PR as the XML/Template parsing was fully replaced and the renderer/compiler code has been refactored.

The best way to review is to start from the API. The two entry points that have been changed are /compile and /vars.

The old XML parsing that was reimplementing a XML parser and generating a customer DOM tree where nodes were XML and JTwig templates nodes. It has been replaced by a standard java StaX XML parser and a template extractor which is responsible of finding and interpreting template code elements. The two parts are clearly separated: a TemplateParser (in this case a JTwigParser) handles the template and a TemplateExtractor class handle the generic extraction of the template code that is inside the visible document text. The result must be a well formed template made of XML and template instructions.

The TemplateParser responsibility is to recognize the template instructions in the characters elements of the XML document, to maintain a state machine whose states indicated if we are in the XML, a code element, a content element or a comment, and tracks the current code block. In addition, the TemplateParser will segment the XML characters element to isolate the template instructions from the other document texts and simplify downstream event processing.

The TemplateExtractor will read XML events, send characters to the TemplateParser, interpret the new parser state, move XML elements out of the template instructions, ensure that matching XML start and end elements are located in the same template code block.

The whole XML handling has been migrated to an event pipeline composed of XMLEventProcessor's. Therefore the template extractor, image adjuster, empty element remover, ODT manifest processor, template and image variable extractors implement the XMLEventProcessor interface. It is now possible to easily add more XML processing in the chain and still understand the full system. Also, each processor can easily be testes separately.

The XML processor is only part of the full document service. It is responsible of actually rendering an ODT document used as a document template, to an other valid ODT document after executing the template with given data context. The ODT document is then formatted using a DocumentCompiler. At the moment, only the LibreOffice ODTDocumentCompiler is implemented. the compiler is responsible to read the input files, render the template, gather the additional assets including additional fonts, and then format the rendered document using the a TemplateFormatter, currently the LibreOfficeAssistant which was not changed in this PR.

The ODTCompiler and the associated ODTRenderer (renamed from the old ODTContext) have been refactored to use the new XML event processor architecture and the functionality that was inside the old ODTContext has been extracted in different processors.

Testing has been improved. The XML processing, including the template extraction, is now well tested and covered. The ODTCompiler area testing has been improved slightly. The new document service has been tested with the Proxeus test-api suite with success.

There are still improvement to make. For example, the formatter using LibreOffice inside the document service, which increase the complexity of the service, can be entirely externalized using projects like https://github.com/thecodingmachine/gotenberg which use the unoconv project that itself uses LibreOffice to handle its tasks. Unoconv is a ten year old project specializing in document conversion.

The handling of table row for-loop available in the old version was not migrated yet due to time limit. It is a small task as it just need to move template for loop instruction written inside table cells (as it is the only place possible if you edit the document using your document editor) outside the row. This will need to be part of an new PR.

In addition, the client has been removed and replace by curl. The API has been augmented to use addition content type that are easier to use with standard HTTP command line tools. Please refer to the updated README.

The opaque run and ui command have been removed and placed with plain java -jar command and the Dockerfile has been updated to use a multi-stage build.

Previous implemtation was re-implementing the full XML parser. This version uses Stax as XML parser and adds parsing of the characters events to extract JTwig code islands.

This is a big job: * added more processor utilities * converted optional behaviour configured in config to processors * refactored compiler to separate getting the vars and rendering * create the var and adjuster image processor and removed the corresponding code from the compiler * added factories Still a few TODOs in the code before it can run. Still more tests to write.

Added unit tests to cover Config.java Also removed a couple of unused methods & set private attributes as they should be

Added unit test for FileResult. Testing the other cases would require a refactoring which I'll leave for now

Gradle wrapper has been updated from 2.13 to 4.8.1 as intellij don't support old versions

Cleaned up a bit Code class; fixed bugs, removed dead code, set private attributes

…Test TemplateCompilerIntegrationTest was in fact testing LibreOffice pdf conversion which is beyond its scope.

…ment-service into refactor-template-parser

src/main/java/com/proxeus/document/Template.java

src/main/java/com/proxeus/document/odt/ODTManifestProcessor.java

src/main/java/com/proxeus/SparkServer.java

src/main/java/com/proxeus/document/odt/ODTRenderer.java

src/main/java/com/proxeus/document/Template.java

src/main/java/com/proxeus/document/odt/ODTRenderer.java

src/main/java/com/proxeus/document/Template.java

src/main/java/com/proxeus/xml/template/DefaultTemplateHandler.java

src/main/java/com/proxeus/xml/template/TemplateExtractor.java

emeka added 2 commits March 2, 2020 10:22

Refactors XML to template conversion

5815152

Previous implemtation was re-implementing the full XML parser. This version uses Stax as XML parser and adds parsing of the characters events to extract JTwig code islands.

emeka force-pushed the refactor-template-parser branch from 285fe7d to 77046ae Compare March 2, 2020 09:23

Silvio Rainoldi and others added 18 commits March 2, 2020 13:39

Config tests

d6e98fe

Added unit tests to cover Config.java Also removed a couple of unused methods & set private attributes as they should be

Renamed DocumentCompilerIF to DocumentCompiler

2c9e0ed

FileResult test

5eff579

Added unit test for FileResult. Testing the other cases would require a refactoring which I'll leave for now

wip: fixed or disabled tests

d39dc6d

Cleanup and few tests

fc2a95e

Upgraded gradle wrapper

0573318

Gradle wrapper has been updated from 2.13 to 4.8.1 as intellij don't support old versions

Code tests & fixes

2965ae4

Cleaned up a bit Code class; fixed bugs, removed dead code, set private attributes

wip: xml tests run

41982b6

wip: delete old code and format code

c4cf898

Improved TemplateCompilerTest and removed TemplateCompilerIntegration…

9d37567

…Test TemplateCompilerIntegrationTest was in fact testing LibreOffice pdf conversion which is beyond its scope.

Add test for AssetFile

f9cceb7

Add test for DecimalAndUnit

8e196c6

Add test for ImageAdjusterRunnable

9a3fe0c

wip: replace client app with curl

c5c8a5f

Merge branch 'refactor-template-parser' of github.com:ProxeusApp/docu…

3b7fabf

…ment-service into refactor-template-parser

wip: remove DEBUG logs

2ff5d3f

wip: Fix Dockerfile and README

1596bf4

wip: fix failing proxeus test-api

78f3797

emeka requested review from alexblockfactory, ianaz and lukarth March 14, 2020 22:00

ianaz suggested changes Mar 16, 2020

View reviewed changes

ianaz reviewed Mar 16, 2020

View reviewed changes

src/main/java/com/proxeus/document/odt/ODTRenderer.java Outdated Show resolved Hide resolved

emeka added 2 commits March 17, 2020 10:22

wip: PR review fixes

5702c80

wip: PR review fix

a1f7776

ianaz approved these changes Mar 19, 2020

View reviewed changes

alexblockfactory reviewed Mar 19, 2020

View reviewed changes

src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved

alexblockfactory reviewed Mar 19, 2020

View reviewed changes

src/main/java/com/proxeus/document/Template.java Show resolved Hide resolved

alexblockfactory reviewed Mar 19, 2020

View reviewed changes

src/main/java/com/proxeus/xml/template/DefaultTemplateHandler.java Outdated Show resolved Hide resolved

alexblockfactory reviewed Mar 19, 2020

View reviewed changes

src/main/java/com/proxeus/xml/template/TemplateExtractor.java Show resolved Hide resolved

alexblockfactory approved these changes Mar 19, 2020

View reviewed changes

emeka added 4 commits March 20, 2020 14:08

wip: Alex PR review fixes

c0a2b85

wip: update circleci config

4e432a6

wip: made latest the production docker version

fc32da9

wip: dummy changes to trigger circleci

3536257

emeka merged commit dab08b5 into master Mar 21, 2020

emeka deleted the refactor-template-parser branch March 21, 2020 15:16

emeka mentioned this pull request Mar 21, 2020

Use document service staging ProxeusApp/proxeus-core#160

Merged

10 tasks

loleg mentioned this pull request Mar 24, 2021

Warning in TemplateExtractor #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor template parser #3

Refactor template parser #3

emeka commented Mar 2, 2020 •

edited

Loading

Refactor template parser #3

Refactor template parser #3

Conversation

emeka commented Mar 2, 2020 • edited Loading

emeka commented Mar 2, 2020 •

edited

Loading