-
Notifications
You must be signed in to change notification settings - Fork 272
Customizing Docx Rendering
The DocxRenderer
is found in the flexmark-docx-converter
module.
The rendering process involves creating an empty WordprocessingMLPackage
and passing it to the
renderer so it can add rendered markdown contents to this package.
The renderer uses the styles defined in the package to apply formatting for markdown elements. Customizing the output is a matter of changing the styles you pass to the renderer.
Since the rendering appends contents to the package, you can pass any a non-empty document to have markdown contents appended. This includes rendering several markdown documents to a single package.
A default empty document in XML form is included in the module and you can simply use
DocxRenderer.getDefaultTemplate()
static method to get it.
There are other convenience methods to read in a template from a stream or a resource.
The styles in the package are expected to have IDs which will be associated with various
markdown elements being rendered. The default values are used in the empty.xml
document
template used by default.
The renderer will read the styles from the package and do its best to propagate them to nested markdown elements.
For convenience there is also a DocxContextImpl
class that can be used to create docx content
via code. It is used by the renderer but also by the ComboDocxConverterSpecTest
test file
which appends the status of all tests, markdown source and resulting docx conversion into a
single document. It is a good source for sample code should you need it.
DocxConverter CommonMark Sample and DocxConverter Pegdown Sample files shows how to use this module.
The renderer relies on the style configuration to produce the document. If you modify the styles inconsistently then your results will reflect this.
The styles are in XML form in the empty.xml
default template file used for rendering and you
can make a copy of it and modify for your use. Thereafter, passing either the string or input
stream to DocxRenderer.getDefaultTemplate()
with that content. When passing a stream then the
data is not limited to XML and can be a docx document stream.
Unless you are very comfortable with manipulating Docx format and don't mind debugging wrong style assignments it is recommended that the default style names be used and only the style definitions be changed to reflect your desired style instead.
Another way to make it work is to open the empty.xml
package in a word processor that supports
it (Libre Office, Word) and modify the styles and save the document as XML or docx.
Modifying styles with MS Word is also possible but for that you will need to start with a docx document that already contains all the needed styles. In the root directory of the distribution jar file you will find the flexmark-empty-template.docx file. This file contains all the markdown elements and styles used in the conversion that you can modify. It also contains the instructions on the best way to do this with success.
The file was produced by running the docx conversion on the empty.md file, also found in the jar, with the default template.
A sample which uses empty.md
and empty.xml
template to generate
flexmark-empty-template.docx
can be found in
DocxConverterEmpty Sample
If you do create your own styles template document, it is highly recommended to run the
empty.md
through the conversion, using your modified template document to make sure all styles
are present and nothing got messed up in the process.
The renderer uses the DocxLinkResolver
for basic link resolution for document relative and
site relative URLs. For this to work you will need to provide a DocxRenderer.DOC_RELATIVE_URL
and DocxRenderer.DOC_ROOT_URL
so that your links can be properly mapped to files or http://
resources.
The renderer will embed any images linked through image or image refs in the document. Resolved links can be http: or file: protocol. In either case the renderer will load the file and embed it in the document.
You can provide your own link resolver to customize link resolution rules. The link resolver
used by the DocxRenderer
is the same as ones used by HtmlRenderer
so you can reuse your
existing HTML custom link resolvers.
Docx format does not allow for the same flexibility as HTML/CSS in a browser. Specifically, nested borders are not available for text paragraphs. Therefore if an element in markdown is nested within several parents that render a border then the border from the most recent parent will be rendered. This specifically affects block quotes in the default template.
Additionally, in docx, the border offset from the left margin is specified in pt (points, 1/72 in.) and limited to 31 pt maximum. Other measurements are in twips (1/20 of a point) with no practical limitation. This creates a condition where the child indent combined with a parent indent can easily exceed the 31 pt limit, making it impossible to keep the child's border (really the parent's extended to the child) aligned with the parent's border. When this happens the 31 pt limit is respected and the child border will be offset from the parent's.
Another caveat is that the left margin and hanging indents have 20x the resolution of the border offset. Which means that it is impossible to keep the border of the child aligned with the parent unless the child indent and the parent indent differ by a multiple of full points.
The renderer detects when this is not the case, which would cause the child border to be visibly misaligned, and adjusts the child's left margin to eliminate this misalignment. Unless your eyes are very sensitive you will not notice the less than 1 pt shift in the text left margin, whereas anyone can notice a 1/2 pt break in a straight line.
The development of this module was sponsored by Johner Institut GmbH. It was needed to allow easy conversion of their internal documentation in markdown to the docx format preferred by users at large.
The module uses the docx4j library for handling all docx manipulations.