-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RTL/bidi text, some font and harfbuzz fixes #309
Conversation
Skip elements among siblings that are not list items. By @pkb from buggins/coolreader#105
@@ -13364,6 +13364,10 @@ bool ldomNode::getNodeListMarker( int & counterValue, lString16 & marker, int & | |||
css_style_ref_t cs = child->getStyle(); | |||
if ( cs.isNull() ) | |||
continue; | |||
if ( cs->display!=css_d_list_item_block && cs->display!=css_d_list_item) { | |||
// Alien element among list item nodes, skip it to not mess numbering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alien, hehe. :-)
crengine/src/hyphman.cpp
Outdated
{ | ||
bool soft_hyphens_found = false; | ||
for ( int i = 0; i<len; i++ ) { | ||
if ( widths[i] + hyphCharWidth > maxWidth ) | ||
break; | ||
if ( str[i] == UNICODE_SOFT_HYPHEN_CODE ) { | ||
flags[i] |= LCHAR_ALLOW_HYPH_WRAP_AFTER; | ||
switch ( flagSize ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you ultimately intend to support more flagSize values, which seems unlikely, I'd swap that switch to a simple if (with the most common branch first) ;).
crengine/src/hyphman.cpp
Outdated
@@ -894,18 +906,30 @@ bool TexHyph::hyphenate( const lChar16 * str, int len, lUInt16 * widths, lUInt8 | |||
// p+2 because: +1 because word has a space prepended, and +1 because | |||
// mask[] holds the flag for char n on slot n+1 | |||
if ( (mask[p+2-soft_hyphens_skipped]&1) && nw <= maxWidth ) { | |||
flags[p] |= LCHAR_ALLOW_HYPH_WRAP_AFTER; | |||
switch ( flagSize ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
crengine/src/hyphman.cpp
Outdated
@@ -967,7 +991,19 @@ bool AlgoHyph::hyphenate( const lChar16 * str, int len, lUInt16 * widths, lUInt8 | |||
break; | |||
} | |||
if (!disabled) | |||
flags[i] |= LCHAR_ALLOW_HYPH_WRAP_AFTER; | |||
switch ( flagSize ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
crengine/src/lvfntman.cpp
Outdated
// correct gamma | ||
if ( gammaIndex!=GAMMA_NO_CORRECTION_INDEX ) | ||
cr_correct_gamma_buf(item->bmp, w*h, gammaIndex); | ||
} else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't checked in context, so it's hard to tell from the diff view, is the removed {
after the else
expected here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got me worried :) but no pb, the #endif
is the counterpart of a ... #if 0
, and that } else
is still in it. So, from space, it looks like:
}
else {
#if 0
if ( bitmap->pixel_mode==FT_PIXEL_MODE_MONO ) {
memset( item->bmp, 0, w*h );
lUInt8 * srcrow = bitmap->buffer;
lUInt8 * dstrow = item->bmp;
for ( int y=0; y<h; y++ ) {
lUInt8 * src = srcrow;
for ( int x=0; x<w; x++ ) {
dstrow[x] = ( (*src)&(0x80>>(x&7)) ) ? 255 : 0;
if ((x&7)==7)
src++;
}
srcrow += bitmap->pitch;
dstrow += w;
}
} else
#endif
memcpy( item->bmp, bitmap->buffer, w*h );
// correct gamma
if ( gammaIndex!=GAMMA_NO_CORRECTION_INDEX )
cr_correct_gamma_buf(item->bmp, w*h, gammaIndex);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but that commented out block would break if uncommented as-is, right?
(i.e., only the memcpy would be part of the "else" branch).
(Sidebar: this is partly why I'm really not a fan of the broken down } else {
syntax and non-mandatory braces even for one-liners ;)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Better then to just add a // : } // else:
?
Otherwise, I'd need to add after these 4 active lines:
#if 0
}
#endif
Small bug noticed while looking at the first screenshot: footnote links are recorded in the left to right order, so a bit dis-ordered in the in-page footnotes. |
No logical changes: just tab and indentation cleanups, and comments.
We'll need more flags for char classification. HyphMan hyphenate() (used by some other parts of the code) then needs to be tweaked so it can work on a table of lUInt8, as well as on our upgraded table of lUInt16 flags.
Use Freetype embolden API instead of LVFontBoldTransform to make fake bold (for fonts that do not provide a bold face). This allows Harfbuzz to work with them (even if they won't look as nice as if they came with a real bold font). (LVFontBoldTransform is a wrapper that enlarge glyphs and advances, so killing any Harfbuzz measurement and glyph replacements.) See comments for more details.
Even if the original text is italic and/or bold, when requesting the fallback font, we always got the regular unbold variant - so part of a sentence or word would jump out as strange. With this and previous commit (Freetype embolden), we'll get many things nicer when using our FreeSerif (which has no italic nor bold variant) as the fallback font. Also loosen fonts charset detection by no more looking for "azAZ09" in a font (NotoSansMyanmar.ttf, for example, comes without them and could then not even be used as a fallback font). Also switch symbol fonts to the FT_ENCODING_MS_SYMBOL charmap immediately on load - otherwise Harfbuzz would not see/use the symbol glyphs.
measureText(): accepts an added "hints" parameter, thru which we can pass text direction and additional hints for Harfbuzz (begin/end of paragraph for now). For freetype and harbuzz light, draw chars in the reverse order when direction is RTL: this allows individual RTL words to be drawn correctly, even when not using Harfbuzz (and to witness the additional Harfbuzz magic when switching to it). DrawTextString(): returns advance, so we can draw subsegments of the text with the fallback font, instead of only individual chars are previously. This is needed to get correct shaping with Harfbuzz with the fallback font. Harbuzz, in both measureText() and DrawTextString(): Don't use filterChar() with the main font, as the not-found chars might be found in the fallback font; no need to replace them thar early. Rework glyph/cluster/text walking and drawing to be more generic: should work with "one char > multiple glyphs" situations (we previously handled correctly only "multiple chars merged into a single glyph", like ligatures), and with RTL text. Also accumulate not-found glyphs so we can draw them as a single segment with the fallback font, with harfbuzz (instead of with Freetype previously): this is needed when using a latin font as the main font, and be able to see nice arabic drawn with the fallback font. Drop letterspacing when the detected script is cursive. Harfbuzz light: hbCalcCharWidth(): skip triplet when any of the 3 chars is not found, as it would mess up the result (and cached values) when some char is combined with mutiple rare diacritic marks (eg. Hebrew). This could mess the drawing of the whole text. Fallback to more robust Freetype measurement in such cases.
Shouldn't have any impact on pure LTR text. Additions to the usual LTR processing: copyText(): detect if we have RTL chars (arabic, hebrew, unicode bidi chars...) in a paragraph text, and use fribidi only when we have some (fribidi processing is quite expensive, so avoid it when not needed). When we do, compute bidi levels, to be used for visual reordering in addLine(). measureText(): Split measuring on bidi level change (and also on letter spacing change - upstream fix by @pkb). Provide fribidi segment direction and harfbuzz hints to font->measureText() for correct Harfbuzz measurements. Allow for ignoring some chars when measuring and drawing (for now: the set of unicode bidirectionality chars that can be used to tweak the unicode bidi algorithm, as some font have glyphs for them). When splitting the paragraph into lines, we continue to process chars in the logical order. Only in addLine(), with bidi paragraphs, we reorder the chars (and flags, widths...) from the logical line segment, into their visual order, and we make words from the result the usual way, with some additional flags (to help later with drawing, createXPointer() and getRect()). alignLine(): simple tweaks for RTL paragraphs: - put text-indent on the right - for justified text, align last (or single) line to the right More work needs to be done on the block rendering code for proper RTL layout (list items bullets on the right, table columns ordered from right to left...) lvrend.cpp: handle <bdi> and <bdo> elements, used to tweak the unicode bidirectional algorithm. Also add simple support for <q> by using a hardcoded single set of quotes.
To allow links/text to be selected and highlighted in bidi/RTL text. Text selection should work fine in pure RTL text, but may get bogus in bidi text when selection cross bidi levels (where a single selection suddenly becomes 2 as we are steping over a segment of the opposite direction, and moving back towards the previous segment...)
In EPUB, the <html> node of each embedded HTML file is not included in the generated single DOM. We now parse its attributes and forward them to be included as attribute of the followup <docFragment> element, so they are part of the DOM.
12e542e
to
ce616f5
Compare
_embolden = true; | ||
// A real bold font has weight 700, vs 400 for the regular. | ||
// LVFontBoldTransform did +200, so we get 600 (demibold). | ||
// Let's do the same (even if I don't see why not +300). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errm maybe let's just do 300 then? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not because "I don't see why" that there is not a why :)
It has no impact on the visible weight of the font, it's just used in the font match scoring, so may be there is a reason for our fake bold font to not be 700, as a real bold font will be 700 and then might be prefered - that's speculation, but not taking any risk :)
In-page footnotes are now vertically stacked in the order they are met by the reading direction in a same line.
b93e0aa
to
6ed9d00
Compare
6ed9d00
to
c2727e7
Compare
See individual commit messages for details. Some technical notes in #307.
Some additional work on the block rendering code (list item bullets on the right, ordering of table cells/columns from right to left, H1,H2
text-align
should bestart
instead ofleft
in our epub.css...) is still needed to have full support for RTL documents - but a few style tweaks to force text right alignment may help in the meantime.For now, we get the following (showing these limitations):
More screenshots of arabic and hebrew texts in koreader/koreader#5359 (comment).
Note: we won't draw highlights well when they span bidi segment boundaries, as Firefox would do well (I have no idea how to do that as well):
The whole
<bdi>
and<bdo>
handling in lvrend.cpp was to have these kind of samples (found on the web) work as expected:Some visual examples regarding the first commits related to fonts:
This is some text rendered with Harfbuzz with a font with its bold and italic variants present (note the different italic
f
, and thefi
ligature in all variants):If I remove the italic, bold and bold italic font files, it previously rendered (with former crengine embolden code loosing the harfbuzz
fi
ligature on the bold):With this PR, it will render as (note the regular
f
in italic, and the preservedfi
ligature):This helps correctly rendering arabic drawn with the fallback font (here, our regular-only FreeSerif):
Previously, with our limited to western ligatures harfbuzz code, it was this mess (even messier with a bigger font size):