Skip to content

EPWING Exporter converts EPWING dictionaries to XDXF or HTML5

Notifications You must be signed in to change notification settings

homocomputeris/epwing-exporter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EPWING Exporter

EPWING Exporter is a program that converts dictionaries from EPWING format to either XDXF or HTML5. It is intended for use as the first stage of an ETL process to load an EPWING dictionary into a full-text search engine.

It preforms a best-effort re-encoding of the dictionary to UTF-8. However, the conversion may drop invalid characters. It also performs a per-sub-book deduplication of dictionary entries, but some duplicates are difficult to detect in the absence of more sophisticated analysis.

After a conversion completes, a file with conversion metrics is generated in the YAML format.

Acknowledgements

Gaiji tables were adapted from various applications by Christopher Brochtrup (CB4960).

For dictionaries without gaiji tables, gaiji characters are escaped in the output as:

{{ wide <gaiji-id-number> }}
{{ narrow <gaiji-id-number> }}

Usage

The program may be run with the following arguments to export XDXF:

epwing-exporter
    # The top-level directory of the EPWING dictionary
    -epwing_directory ${epwing_directory}
    # Output file for the rendered XDXF
    -xdxf_file out/output.xml
    # Output file for the conversion metrics report
    -metrics_file out/output.xml.yaml
    # Output file for any warnings or errors during validation
    -tidy_errors_file out/output.xml.err.txt

To export HTML5:

epwing-exporter -epwing_directory ${epwing_directory}
                -html5_file out/output.html5
                -metrics_file out/output.html5.yaml
                -tidy_errors_file out/output.html5.err.txt

Building

Linux and Docker users may issue the following to obtain a build environment:

docker build -t epwing-exporter .
docker run --rm -it -v "$PWD:/workspace" epwing-exporter

EPWING Exporter can be built from source on Linux and OSX with CMake.

For example, to build on Linux:

# CMake will fail before the first build due to an issue
# with the way ExternalProject interacts with ordinary build phases.
( mkdir build && cd build && cmake .. ; make )
# Re-generating and building a second time should succeed.
( cd build && cmake .. && make )
# Run the resulting binary.
build/cmd/epwing-exporter

Builds have been tested against Clang 6 and 7 (OSX & Linux) and GCC 8 (Linux) with CMake 3.13. ZLib must be installed on the system.

Note that CMake segmentation faults were observed when attempting to build with CMake 3.18 and 3.19.

Dependencies

If building on OSX, libiconv must be installed with brew install iconv. All other dependencies are built from source and statically linked.

Documentation

Documentation on the EPWING format is difficult to come by. The following collection of links was helpful during the development process:

About

EPWING Exporter converts EPWING dictionaries to XDXF or HTML5

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 98.3%
  • CMake 1.2%
  • Other 0.5%