0.8.2
Bugfix release for an edge-case in hOCR parsing.
Bugfixes:
- hOCR: Fix stack overflow when handling empty words in combination with a partially
hyphenated word
Other Changes:
- Improved error message in case of errors during highlighting, the message now includes the source pointer of the failed document, or if storing OCR in the index, the beginning of the broken content. Also included is the internal Lucene document identifier. By adding the
[docid]
field to the returned fields for the failing query, the internal id is added to very document in the result set for a failing query, which should allow quick identification of the documents that cause issues during highlighting.