Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor template parser #3

Merged
merged 26 commits into from
Mar 21, 2020
Merged

Refactor template parser #3

merged 26 commits into from
Mar 21, 2020

Conversation

emeka
Copy link
Contributor

@emeka emeka commented Mar 2, 2020

Finally the PR for the document service. It is a big PR as the XML/Template parsing was fully replaced and the renderer/compiler code has been refactored.

The best way to review is to start from the API. The two entry points that have been changed are /compile and /vars.

The old XML parsing that was reimplementing a XML parser and generating a customer DOM tree where nodes were XML and JTwig templates nodes. It has been replaced by a standard java StaX XML parser and a template extractor which is responsible of finding and interpreting template code elements. The two parts are clearly separated: a TemplateParser (in this case a JTwigParser) handles the template and a TemplateExtractor class handle the generic extraction of the template code that is inside the visible document text. The result must be a well formed template made of XML and template instructions.

The TemplateParser responsibility is to recognize the template instructions in the characters elements of the XML document, to maintain a state machine whose states indicated if we are in the XML, a code element, a content element or a comment, and tracks the current code block. In addition, the TemplateParser will segment the XML characters element to isolate the template instructions from the other document texts and simplify downstream event processing.

The TemplateExtractor will read XML events, send characters to the TemplateParser, interpret the new parser state, move XML elements out of the template instructions, ensure that matching XML start and end elements are located in the same template code block.

The whole XML handling has been migrated to an event pipeline composed of XMLEventProcessor's. Therefore the template extractor, image adjuster, empty element remover, ODT manifest processor, template and image variable extractors implement the XMLEventProcessor interface. It is now possible to easily add more XML processing in the chain and still understand the full system. Also, each processor can easily be testes separately.

The XML processor is only part of the full document service. It is responsible of actually rendering an ODT document used as a document template, to an other valid ODT document after executing the template with given data context. The ODT document is then formatted using a DocumentCompiler. At the moment, only the LibreOffice ODTDocumentCompiler is implemented. the compiler is responsible to read the input files, render the template, gather the additional assets including additional fonts, and then format the rendered document using the a TemplateFormatter, currently the LibreOfficeAssistant which was not changed in this PR.

The ODTCompiler and the associated ODTRenderer (renamed from the old ODTContext) have been refactored to use the new XML event processor architecture and the functionality that was inside the old ODTContext has been extracted in different processors.

Testing has been improved. The XML processing, including the template extraction, is now well tested and covered. The ODTCompiler area testing has been improved slightly. The new document service has been tested with the Proxeus test-api suite with success.

There are still improvement to make. For example, the formatter using LibreOffice inside the document service, which increase the complexity of the service, can be entirely externalized using projects like https://github.com/thecodingmachine/gotenberg which use the unoconv project that itself uses LibreOffice to handle its tasks. Unoconv is a ten year old project specializing in document conversion.

The handling of table row for-loop available in the old version was not migrated yet due to time limit. It is a small task as it just need to move template for loop instruction written inside table cells (as it is the only place possible if you edit the document using your document editor) outside the row. This will need to be part of an new PR.

In addition, the client has been removed and replace by curl. The API has been augmented to use addition content type that are easier to use with standard HTTP command line tools. Please refer to the updated README.

The opaque run and ui command have been removed and placed with plain java -jar command and the Dockerfile has been updated to use a multi-stage build.

emeka added 2 commits March 2, 2020 10:22
Previous implemtation was re-implementing the full XML parser.
This version uses Stax as XML parser and adds parsing of the characters
events to extract JTwig code islands.
This is a big job:
* added more processor utilities
* converted optional behaviour configured in config to processors
* refactored compiler to separate getting the vars and rendering
* create the var and adjuster image processor and removed the
corresponding code from the compiler
* added factories

Still a few TODOs in the code before it can run.
Still more tests to write.
@emeka emeka force-pushed the refactor-template-parser branch from 285fe7d to 77046ae Compare March 2, 2020 09:23
Silvio Rainoldi and others added 18 commits March 2, 2020 13:39
Added unit tests to cover Config.java
Also removed a couple of unused methods & set private attributes as they should be
Added unit test for FileResult. Testing the other cases would require a refactoring which I'll leave for now
Gradle wrapper has been updated from 2.13 to 4.8.1 as intellij don't support old versions
Cleaned up a bit Code class; fixed bugs, removed dead code, set private attributes
…Test

TemplateCompilerIntegrationTest was in fact testing LibreOffice pdf
conversion which is beyond its scope.
src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/SparkServer.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/SparkServer.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/document/odt/ODTRenderer.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved
src/main/java/com/proxeus/document/Template.java Outdated Show resolved Hide resolved
@emeka emeka merged commit dab08b5 into master Mar 21, 2020
@emeka emeka deleted the refactor-template-parser branch March 21, 2020 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants