html2openxml is a Java library that converts HTML content to OpenXML format (Microsoft Word .docx
format), supporting common HTML elements like paragraphs, bold, italic, underline, tables, and more. Built using docx4j and Jsoup, you can try a live demo of this library at https://html2openxml-demo.herokuapp.com/.
Add the following dependency to your pom.xml
:
<dependency>
<groupId>com.denisfesenko</groupId>
<artifactId>html2openxml</artifactId>
<version>1.0.0</version>
</dependency>
Or if you're using Gradle, add this to your build.gradle
:
implementation 'com.denisfesenko:html2openxml:1.0.0'
Here's a simple example of how to use html2openxml to convert HTML to a .docx
file:
import com.denisfesenko.converter.HtmlToOpenXMLConverter;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import java.io.File;
public class Main {
public static void main(String[] args) {
String htmlContent = "<html><body><p>Hello, world!</p></body></html>";
try {
HtmlToOpenXMLConverter converter = new HtmlToOpenXMLConverter();
WordprocessingMLPackage wordDocument = converter.convert(htmlContent);
File outputFile = new File("output.docx");
wordDocument.save(outputFile);
} catch (InvalidFormatException e) {
e.printStackTrace();
}
}
}
html2openxml supports the following HTML elements:
<p>
- Paragraph<b>
,<strong>
- Bold<i>
,<em>
- Italic<u>
- Underline<sub>
- Subscript<sup>
- Superscript<table>
- Table<tr>
- Table Row<td>
- Table Cell<span>
- Specifically dealing with background colors<hr>
- Horizontal rule<pb>
- Page Break<br>
- Line Break
You can extend the functionality of html2openxml by implementing your own custom tag handlers. Simply implement the TagHandler
interface and register your handler with the HtmlToOpenXMLConverter
:
TagHandler customTagHandler = new TagHandler() {
@Override
public void handleTag(Node node, WordprocessingMLPackage wordMLPackage) {
//Custom implementation
}
@Override
public boolean isRepeatable() {
return true;
}
};
HtmlToOpenXMLConverter converter = new HtmlToOpenXMLConverter(Map.of("custom-tag", customTagHandler));
Now, when the converter encounters an element with the tag name <custom-tag>
, it will call the handleTag
method of your customTagHandler
instance.
This library is designed to handle a subset of HTML elements and does not provide support for all HTML5 tags and attributes. It also does not handle CSS styles. If you need more advanced conversion features, you may need to consider other options or extend this library with custom tag handlers.
html2openxml is released under the MIT License. See the LICENSE file for more details.