-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hanging punctuation: new implementation #355
Conversation
My gut-feeling would be to keep allowing the LSB to overflow, no matter what, yeah. |
Even with such italic as the highlighted italicized Jogof in the 3rd sample. Should I try to account for the available space on the left/right so we can let things overflow when they can - and limit to what's available when they can't ? |
To illustrate another thing I mentionned in a comment: This has striken me immediately when switching to this page, not sure if it's because of the small standout or the boldness. |
I would go for the main paragraph font. It does not look good if the bold hyph hangs more. |
@poire-z about LSB, +1 to the above, totally. I remember showing some examples from Chrome and IE, anyway this seems natural to me. These things have metrics that tell about the overlows and allow them afaik. I'm on kobile so I also have trouble recalling the discussion, but also remember showing some screenshits from fontforge. There's also |
Also, it super frigging awesome to see floating punctuation! I second @zwim that bold could stick out the same as non-bold, it would look more even. As such, i think that's one of the points of floating punctuation, paradoxically :D |
@poire-z: Yeah, those commits mainly seem to be aimed at avoiding clipping, more than anything else ;). |
I checked how browsers deal with LSB and RSB, and the results show they just don't :) KOReader with this PR, hanging punctuation disabled and overflow prevented Sharing the HTML file and an EPUB version test-Jogof-html+epub.zip in case someone wants to test with others browsers/readers. Anyway: I remember I wrote this in koreader/koreader#4577 (comment):
So, I'm not so sure it's just fine to ignore LSB/RSB. We might be used to it on computers, but strict fitting like in old mechanical typography might be prefered by some :) |
Alas, that's not so easy: <p>
<span style="font-size: 0.8em"> <!-- I've seen that quite often in publishers books :( -->
<span style="font-size: 1.2em">S</span>ome text withawordthatwillhyphenate
and some other text with a <big>bigwordthatiwllhyphenate</big> and later,
this time, a <small>smallwordthatwillhyphenate</small> and done.
</span>
</p> In this, it's not obvious to guess what should be the normalized hyphen length :/ |
I'm all for bleeding / protruding LSB / RSB, otherwise in my reading I would experience a lot of jerkiness :) So I'm glad there is a switch for that planned :) |
Discussed rather than planned :) I really don't know which way to go, I'm a bit stuck :/ On the other hand, I spend quite some time with measuring correctly by accounting for these overflows, adding caches for LSB and RSB... that it would break my heart to throw that work away :) But there are still some bugs there that I would then not have to fix if I throw that away. And having a toggle with various options would in some cases need a full re-rendering that I didn't need with this hanging punctuation up to now with this PR. I somehow would like to have this line stretching free from needing a full-rerendering - but needing to re-measure table cells or floats would need such a full-rerendering. Or may be I could just add the extra LSB/RSB when measuring - whether we end up using it or not when drawing - which might make text not centered/fitted in their container...
I guess most of you mostly think and care about the full-width main flow main text as most book don't have floats or tables - of if they do, publishers add enough padding. But I'd like us to do the right thing (which might be "doing nothing" :) with edge-cases and degenerate cases. So we have simple rules and can without too much investigation declares "not our bug" or "our bug". So, still thinking... (Also pinging upstream @virxkane @pkb for their opinions - may be there are some things more noticable with italic cyrillic that really need special attention?) |
Hello, shouldn't the ellipsis (U0+2026 "…") be added, too? Maybe with similar weight as the emdash? |
Yeah, I think the font metrics do the bleed non-bleed decisions, so it might be safe to escalate the blame ;)
True, at least I do 😁 And I understand handling all other cases is much more complex. My bottom line is, then, please leave a css rule, preference lua variable or even build flag to allow the bleeding ;) |
I would also change the % for guillemets Also, for french readers (dunno if this happens in other languages), as we often use a space after such guillemets, or before some punctuations ( (I sometimes get the feeling it makes the hole (the space) more noticable, although on these screenshots I don't feel it much.) |
I have played around with some ebooks and hanging punctuation. For German (we do NOT have a space before a single punctuation mark) The look of the hanging punctuation depends very strong on the margin width: I think it would be intuitive, that we have a switch So the user can first change the margin to his taste and adapt the hanging of the punctuation afterwards. |
I didn't expect we'd need to go toward such granularity :)
I'll have a try. |
The holes caught my eyes a bit more with 50% - possibly because I'm now used to 70% :) That feeling also depends on the font I use :/ Will keep using this 50% build thus. Anyway, here's a Kobo build (with the values from this PR and the paper, so hyphen 70%) for Kobo users to have a try and possibly give some feedback: libcrengine-hangingpunct-kobo.zip |
As usual when weird French rules are involved, Québec to the rescue: http://bdl.oqlf.gouv.qc.ca/bdl/gabarit_bdl.asp?id=2039 (which details what needs a space, what doesn't, and what prefers a smaller space) :). And, no, I don't think it should matter here. Hanging means stuff should hang, no matter the language-specific rules and/or punctuation marks ;). |
I thought the point of the hanging punctuation is to make right edge more even. So I would make hang out chars that are actually lighter than regular symbols such as PS: @poire-z did you know there is a property for hanging punctuation in css3? https://drafts.csswg.org/css-text-3/#hanging-punctuation-property |
I know about the CSS property, but to me, it looks like it's really targeted at CJK (some property to do it only on first line, or on last line) and a bit limited (ideographic full stop, full width comma...) There is a microtypography thesis, and some implementation in somethingTEX, that use fractions of the width of a char, and, to me and others, it looks really better. Some details and links in koreader/koreader#6235 (comment). So, we're more targetting https://en.wikipedia.org/wiki/Optical_margin_alignment than https://en.wikipedia.org/wiki/Hanging_punctuation :) If you really prefer 100% for a smaller set of chars, well, we might then make that multi-values setting like proposed above :/ |
So, we're more targetting
https://en.wikipedia.org/wiki/Optical_margin_alignment than
https://en.wikipedia.org/wiki/Hanging_punctuation :)
❤️. Also, what @pkb said about evening the edges of a paragraph, to me also
seems to match better the optical option of partial outdents. That's what I
encounter in printed books as well. And as always, having another oje
switchable through css is good to me :)
|
But we're the ones deciding which stuff should hang and by how much :) It's not written in specs/stones :) I'm still bothered by these ones: [0x2039] = { 0.70, 0.70 }, -- left single guillemet ‹
[0x203A] = { 0.70, 0.70 }, -- right single guillemet ›
[0x00AB] = { 0.50, 0.50 }, -- left guillemet «
[0x00BB] = { 0.50, 0.50 }, -- right guillemet » |
Doesn't bother me in the least, FWIW (especially when it's NOT highlighted ;p). But then again, me and my almost-0 horizontal margins are not the target audience ^^. |
ebbd803
to
8a80012
Compare
So far I am totally thrilled with the hanging punctuation. Yesterday I had installed an older KOReader by accident, it looked strange. I must say I am so used to the hanging thing, that I don't want to miss it anymore. Two small points: |
Good, I also quite like it, disabling it feels bogus now :) Still thinking (#355 (comment)) , mostly just contemplating the code for now :/ |
While talking about naming things, and as it's more pleasant coding when you get a good name for the thing and it makes you write nice prose: Assuming we're going to render by default as browsers do, so not caring about glyphs overflowing their line boxes and borders, or overidding each others, crengine/crengine/src/lvstsheet.cpp Lines 1736 to 1768 in 0a46f76
to get back the current behaviour of trying to ensure no glyph overflow on the edges and on previous and next text nodes or images - with some variation to get that with italic only ? table { -cr-hint: no-glyph-overflow; }
table { -cr-hint: glyphs-confined; }
table { -cr-hint: strict-edge-fitting; }
table { -cr-hint: strict-edge-fitting-if-italic; } Any better idea? |
@poire-z Is there a technical name for it? (For potential inspiration.) |
@pazos: You added, that glyphs only hang, if they are "alone". I think this was because of the guillemets in French. This suppression has a bad sideffekt on emdash (please look in the screenshot). In German almost all punctuation marks are not seperated by spaces. Either they are leading or trailing other glyphs directly. The only exceptions to this rule I remember: endash, emdash and ellipsis can be surrounded by spaces. At least in German it would be desireable that all glyphs hang. Do you see a method to implement this? |
Indeed: crengine/crengine/src/textlang.cpp Lines 739 to 763 in a0d0d1f
We could make that dependant on language - or on the involved punctuation itself (but I wouldn't like it too with an emdash I guess, it would bother much as much as the guillemet). If you do like it, well, I dunno how to handle that if it really comes down to personal taste. |
I realize my tests for a surrounding space is done a bit too early. There's no reason for one of these two lignes to have the A hangs differently (it hangs only by 5% when it hangs - but enough to notice the misalignment I guess):
I guess my feeling of hole also depends on the shape of what hangs - thin or tall, and by how much it hangs, and how much will stay inside. |
May be something like this?: --- a/crengine/src/textlang.cpp
+++ b/crengine/src/textlang.cpp
@@ -738,27 +738,29 @@ int TextLangCfg::getHangingPercent( bool right_hanging, bool & check_font, const
// In French, there's usually a space before and after guillemets,
// or before a quotation mark. Having them hanging, and then a
// space, looks like there's a hole in the margin.
- // So, avoid hanging if the next/prev char is a space char.
+ // So, for some chars, we'll avoid hanging or reduce the hanging
+ // ratio if the next/prev char is a space char.
// This might not happen in other languages, so let's do that
// prevention generically. If needed, make that dependant on
// a boolean member, set to true if LANG_STARTS_WITH(("fr")).
+ bool space_alongside = false;
if ( right_hanging ) {
if ( pos > 0 ) {
lChar16 prev_ch = text[pos-1];
if ( prev_ch == 0x0020 || prev_ch == 0x00A0 || (prev_ch >= 0x2000 && prev_ch <= 0x200A ) ) {
// Normal space, no-break space, and other unicode spaces (except zero-width ones)
- return 0;
+ space_alongside = true;
}
}
}
else {
if ( next_usable > 0 ) {
lChar16 next_ch = text[pos+1];
if ( next_ch == 0x0020 || next_ch == 0x00A0 || (next_ch >= 0x2000 && next_ch <= 0x200A ) ) {
// Normal space, no-break space, and other unicode spaces (except zero-width ones)
- return 0;
+ space_alongside = true;
}
}
}
@@ -786,66 +788,78 @@ int TextLangCfg::getHangingPercent( bool right_hanging, bool & check_font, const
case 0x002E: // . period
case 0x0060: // ` back quote
// case 0x00AD: // soft hyphen (we don't draw them, so don't handle them)
case 0x060C: // Ø<8C> arabic comma
case 0x06D4: // Û<94> arabic full stop
case 0x2010: // â<80><90> hyphen
case 0x2018: // â<80><98> left single quotation mark
case 0x2019: // â<80><99> right single quotation mark
case 0x201A: // â<80><9A> single low-9 quotation mark
case 0x201B: // â<80>ingle high-reversed-9 quotation mark
+ ratio = 70;
+ break;
case 0x2039: // â<80>¹ left single guillemet
case 0x203A: // â<80>º right single guillemet
- ratio = 70;
+ // These are a wider than previous ones, and hanging
+ // by 70% can give a feeling of wrong justification
+ ratio = space_alongside ? 0 : 70;
break;
case 0x0022: // " double quote
case 0x003A: // : colon
case 0x003B: // ; semicolon
- case 0x00AB: // « left guillemet
- case 0x00BB: // » right guillemet
case 0x061B: // Ørabic semicolon
case 0x201C: // â<80><9C> left double quotation mark
case 0x201D: // â<80><9D> right double quotation mark
case 0x201E: // â<80><9E> double low-9 quotation mark
case 0x201F: // â<80><9F> double high-reversed-9 quotation mark
ratio = 50;
break;
+ case 0x00AB: // « left guillemet
+ case 0x00BB: // » right guillemet
+ // These are a wider than previous ones, and hanging
+ // by 50% can give a feeling of wrong justification
+ ratio = space_alongside ? 0 : 50;
+ break;
case 0x2013: // â<80><93> endash
+ // Should have enough body inside (with only 30% hanging)
ratio = 30;
break;
case 0x0021: // !
case 0x003F: // ?
case 0x00A1: // ¡
case 0x00BF: // ¿
case 0x061F: // Ø<9F>
case 0x2014: // â<80><94> emdash
case 0x2026: // â<80>¦ ellipsis
+ // These will have enough body inside (with only 20% hanging),
+ // so they shouldn't hurt when space_alongside.
ratio = 20;
break;
case 0x0028: // (
case 0x0029: // )
case 0x005B: // [
case 0x005D: // ]
case 0x007B: // {
case 0x007D: // }
ratio = 5;
break;
default:
break;
}
if ( ratio ) {
check_font = false;
return ratio;
}
// Other are non punctuation but slight adjustment for some letters,
// that might be ignored if the font already include some negative
// left side bearing.
+ // The hanging ratio is small, so no need to correct if space_alongside.
check_font = true;
if ( right_hanging ) {
switch (ch) {
case 'A':
case 'F':
case 'K':
case 'L':
case 'T':
case 'V':
case 'W': |
@poire-z: I totally agree with you, that different readers in different languages will have different tastes about the punctuation thing. I also do not know (I haven't read the whole paper, but I surly will), how he exactly got the protruding factors: I think he did it mostly with English, and maybe with Czech texts. For me as not speaking Czech, the punctuation looks very similar to English and German. We know, that French is a bit different. So I am totally fine, when you decide that in French some factors need to be adapted. I have applied the above diff on my KOReader. The texts look very good, no almost perfect! |
@poire-z: Do you want to stay on the terminus "hanging punctuation" or do you want tho change it to something like "ocptical margin compensation"? |
I don't mind having it called "Hanging punctuation", mostly because that's how we've been calling it for months :) and it's actually not a lie: punctuation hangs, may be not fully but it does (and some other non-punctuation can too, but by so little it's hardly noticable).
@zwim: so you're fine with this patch? and so no need (for now) to do anything by language ? I'll be reading on my device with this patch until tomorrow, I'll add it to #363 if nothing hurts me :)
Wondering if I should still use a small number like 20, instead of 0 :) |
@poire-z: Yes I am very fine. Thank you! |
Hanging punctuation is fine by me, and is probably more user-friendly than the alternative ;). |
I think hanging punctuation is clearer too. |
In 6f923c4, I reduce the hanging ratio for « and » when there is a space after or before (so, for French :) Instead of making this punctuation less noticable, which is the purpose of hanging punctuation, it makes it even more noticable. I think it should even be worse for German, which get them inverted. I get 20% only in the margin for French when there is a space after - but even this feels like too much. Was going to go with 10%, but if the issue with german and inverted guillemets is confirmed, I might just go with 0 and never think about « and » anymore :) |
@poire-z : Yes I can confirm that the » and « are in the margin. Especially on the right side. |
I would go for the middle one with 0%. (at least at the left side of the text). How does it look on the right edge? |
After looking a bit longer at your three pictures I would say 20% is a bit better than 0%. |
It's a bit harder to witness these on the right with my hack - but usually, it's more bearable on the right because there are other punctuations hanging - so it's less solitary and surprising. The crengine/crengine/src/textlang.cpp Lines 1129 to 1134 in 08dc3d4
These may also be wide depending on the font, and 70% can feel too wide (my feeling when seeing them in your test book - but I never meet them in French): crengine/crengine/src/textlang.cpp Lines 1113 to 1118 in 08dc3d4
(The comments |
I have tried several fonts and different hanging percentages. What I liked most is:
and
Normally the |
Thanks for the tests! I was going to go with (pardon my latin1): @@ -1110,12 +1112,21 @@ int TextLangCfg::getHangingPercent( bool right_hanging, bool & check_font, const
case 0x201B: // â<80>ingle high-reversed-9 quotation mark
ratio = 70;
break;
+ /* This early idea feels not the best: these have a side taller than
+ * the other, and the glyph may have some strong body with some fonts:
case 0x2039: // â<80>¹ left single guillemet
case 0x203A: // â<80>º right single guillemet
// These are wider than the previous ones, and hanging by 70% with a space
// alongside can give a feeling of bad justification. So, hang less.
ratio = space_alongside ? 20 : 70;
break;
+ */
+ case 0x2039: // â<80>¹ left single guillemet
+ ratio = right_hanging ? 20 : 40;
+ break;
+ case 0x203A: // â<80>º right single guillemet
+ ratio = right_hanging ? 40 : 20;
+ break;
case 0x0022: // " double quote
case 0x003A: // : colon
case 0x003B: // ; semicolon
@@ -1126,11 +1137,22 @@ int TextLangCfg::getHangingPercent( bool right_hanging, bool & check_font, const
case 0x201F: // â<80><9F> double high-reversed-9 quotation mark
ratio = 50;
break;
+ /* This early idea feels not the best: these have a side taller than
+ * the other, and the glyph may have some strong body with some fonts:
case 0x00AB: // « left guillemet
case 0x00BB: // » right guillemet
// These are wider than the previous ones, and hanging by 50% with a space
// alongside can give a feeling of bad justification. So, hang less.
- ratio = space_alongside ? 20 : 50;
+ // ratio = space_alongside ? 20 : 50;
+ // But 50% is still too much with some fonts
+ ratio = 20;
+ break;
+ */
+ case 0x00AB: // « left guillemet
+ ratio = right_hanging ? 10 : 20;
+ break;
+ case 0x00BB: // » right guillemet
+ ratio = right_hanging ? 20 : 10;
break;
case 0x2013: // â<80><93> endash
// Should have enough body inside (with only 30% hanging) I think my (random) 40 is really near your 35 - but I think you've been testing them inverted right ? › on the left as it should be in german ?
Can you give some thoughts to this new way of handling things, and adjust my values if you feel they need some ? |
Yes in German (Austria, Germany) we write: >Hallo< or >>Hallo<<. (in Swizerland I think the guillemet are swapped.) Are x and y swapped in For me (German) this would look good:
and
|
No, it should read as a Your unbalance for the ratio on the left vs right feel strange (40/35 and 20/15) : the glyph for these should be nearly mirrors of each other, so their hanging should be the same. Any reason for this ? And we probably should not hang less on the small side than on the tall side :) |
I'm slowly starting to think that I am really a dyslexic :)
Isn't that the same? If it hangs on the right side of a word, then it is on the right margin and vice versa.
It looks better so. Maybe because on the left side we have almost no hanging (a few percent for some letters AXYWV), so there is a straight line. My eyes are trained to the left start position for reading. If you have a text on the beginning of a paragraph like
and the guillemot is too much on the left side, this looks ugly. On the right side we have letters and punctuation marks and hyphens which hang. On the two books I have viewed, a bit more hanging of the guillemet looks more balanced. If I am honest, (as most of us) I don't always read the whole word letter by letter. Maybe therefore the "blackness" matters on that side more than on the left.
That would be the theory, but if you look at the right hanging
In German (Austria, Germany) we never have a So
|
Update to the last message:
|
Right, we are on the same page.
OK then. I'm still surprise that you are able to notice the 5% difference :) But I'm fine and I may agree with the idea: I also notice them too much on the left margin. /* This early idea feels not the best: these have a side taller than
* the other, and the glyph may have some strong body with some fonts:
case 0x2039: // â~@¹ left single guillemet
case 0x203A: // â~@º right single guillemet
// These are wider than the previous ones, and hanging by 70% with a space
// alongside can give a feeling of bad justification. So, hang less.
ratio = space_alongside ? 20 : 70;
break;
case 0x00AB: // « left guillemet
case 0x00BB: // » right guillemet
// These are wider than the previous ones, and hanging by 50% with a space
// alongside can give a feeling of bad justification. So, hang less.
ratio = space_alongside ? 20 : 50;
break;
*/
// If feels better to not bother about any space alongside and use smaller values.
// We also go with a tad smaller value on the left margin as hanging there is rare.
// In the right margin, hanging a bit more feels better, as it will blend in with
// the more probable other punctuations hanging on the right.
case 0x2039: // â~@¹ left single guillemet
ratio = right_hanging ? 40 : 35;
break;
case 0x203A: // â~@º right single guillemet
ratio = right_hanging ? 40 : 35;
break;
case 0x00AB: // « left guillemet
ratio = right_hanging ? 20 : 15;
break;
case 0x00BB: // » right guillemet
ratio = right_hanging ? 20 : 15;
break; I still have to test this - but does this code match what you expressed with words ? :) |
Yes, I will also test this tomorrow with some different texts (with emdashes & Co.) and And yes, we can drop some branches here :) |
I wouldn't add too much of this asymetry: you and I read LTR text, so our right is end of line - but for people who read RTL (Arabic, Hebrew, which may include bits of latin text), they will find some of the punctuation we are tweaking (may be not the guillemets, but the others, dashes, parens more probably) on the left: their end of line (where all normal punctuation will be) is their left. |
emdashes and Co. look good as they are. So nothimg to do on this side. |
Added that to #461, just for the sake of having it for future tweaks - and made the guillements tweaks to use it. |
case 0x00A1: // ¡ | ||
case 0x00BF: // ¿ | ||
case 0x061F: // ؟ | ||
case 0x2014: // — emdash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add
case 0x2212: // − Minus
If it is not added here, the Minus does not hang, and that looks ugly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not - but with which ratio (hanging percent) ? ie. how wide this character is, in various fonts, when compared to the others: the ascii minus, endash, emdash ?
In which context did you see it ? Mathematical stuff, or it was used (by error/laziness?) by the publisher instead of a more proper character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in various fonts, when compared to the others: the ascii minus, endash, emdash ?
I think similar to endash is the most typical, but I believe some make the stylistic choice to give it a size closer to emdash. Are the others not calculated based on their size in the font?
(by error/laziness?)
I think people can be forgiven for not being able to tell endash and minus apart. I mean, other than some of us who's going to be looking at the code point. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think similar to endash is the most typical, but I believe some make the stylistic choice to give it a size closer to emdash.
I guess this one relates to the width of th +
sign, and should have the same size as it, as well as the size of this neighbour character:
https://www.compart.com/en/unicode/U+2213
Are the others not calculated based on their size in the font?
No, here, we return a %, the % of the char that can hang - so it's more about the generic/expected shape of the Unicode char, its blankness, it's expected width and padding.
We don't look at the glyph - and as we return a %, the font size is handled later.
I think people can be forgiven for not being able to tell endash and minus apart. I mean, other than some of us who's going to be looking at the code point. :-)
Yes, but that's not just the generic minus - somebody did hunt for it :)
But then, if it relates to the + sign, it feels odd to hang the -
but not the +
(which, having a fuller body, may need it less) as in some usages, they would then not align with each others
- con 1
+ pro 1
- con2
+ pro2
all that because some publisher used the wrong char for its purpose.
So, I dunno. Waiting for @zwim to say/show how this char was used in the book where he witnessed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this one relates to the width of th + sign
Perhaps ideally, but I'm not quite sure if that's how all math fonts I've used in LaTeX worked.
Yes, but that's not just the generic minus - somebody did hunt for it :)
I don't think hunt is necessarily the right word. It means they used special math/formula edit mode in MS Word/LO Writer for example, or LaTeX math mode in Markdown.
Hunt seems to imply something closer to unicode input (e.g., if I'm not mistaken Alt+2212 on Windows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The U+2212 is used like an emdash in that book.
And no it is not from an indie author, it is from Heyne (one of the largest German puplishers).
I don't think hunt is necessarily the right word. It means they used special math/formula edit mode in MS Word/LO Writer for example, or LaTeX math mode in Markdown.
That might be true.
If a linebreak happens within a math environment hanging would not be so important, but if someone uses the U+2212 as a replacement for an endash/emdash the glyph should hang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone uses the U+2212 as a replacement for an endash/emdash the glyph should hang.
The issue is that we don't know if "someone uses the U+2212 as a replacement for an endash/emdash" :)
And the thing is that they shouldn't:
https://unicode.org/charts/PDF/U2200.pdf
Unicode doesn't mention that they can be used for other purpose (like they do for some other symbols). They do for other symbols:
So, the question is whether this is common publisher error and we should account for it, to the point of having misalignment when it is used properly in a math context.
And no it is not from an indie author, it is from Heyne (one of the largest German puplishers).
Will you allow a large pig in your living room, just because it is large or you like ham? :)
Many EPUBs from classic french publishers have various errors/bad styling.
Also, where did you see it? As a first char of each part of a dialog ? Inline in the text, and you happened to have a wrap before it ? How common ?
If it's just on one line you have seen it among many pages, well, I'd say it's a proper way to let the bug (bad char used) be known to the reader :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering koreader/koreader#9875 in context:
Please allow me a very last friendly remark: I am happy that in the future I don't have to argue about, if a fucking weird unicode U+2212 in a text is the problem of some fucking publisher. The publisher won't even notice what we write and think here (and the time spent here) nor does the publisher notice the outcome of the discussion. If I would want to improve an eBook reader software, I would ask: What are the publishers are doing wrong, and how can I fix this? Similar to the style tweaks, which are awesome. And as a teacher I could correct the publishers; good luck with that.
The point of arguing is to make the right decision, together.
You may have an idea that it's the right thing to do to hang U+2212 - and it may be.
I looked around, and post some stuff hinting at why/how it may cause issues in other cases.
So, the point is to balance fixing publishers errors vs. messing with correct usages (but probably rare - rarer than publishers errors?).
I asked for more context about where you met it.
We have not yet come to a decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing the publisher's mistake may not be the best behavior due to side effects. You also have to accept that sometimes you just have to live with it and that you can't fix everything (within the rendering context).
NB That's not the conclusion we reached.
Updated details below at #355 (comment)
Initial comment:
Pushing this early so @zwim (and others!) can play with it and provide some feedback and help take decisions.
This kills the original hanging punctuation implementation (aimed at CJK only), to implement hanging punctuation as some kind of visual margin alignment, as envisionned in #307.
It no more adds padding, it just allows the line to be extended a bit outside its container inner box (which is fine if there's some padding, or if we're the main text and the user has keep some page margins - otherwise in the margins, borders, or over other elements :)
I previously detailed 2 technical options for handling this in koreader/koreader#6235 (comment) - and I went with option B, for the simplicity explained there, and because of this:
processParagraph()
only handles text in the logical order: it cuts slices that fit in the container width, without bothering with BiDi re-ordering. Handling left/right hanging punctuation and italic overflows there is not right, as with BiDi, the start/end of the slice in logical order might not be at the start/end of the line in visual order, as processed inaddLine()
.So, I moved the italic overflows handling from
processParagraph()
toaddLine()
.This makes to me both
processParagraph()
(that just cut lines at wrap opportunities) andaddLine()
(that does a lot more work for adjusting words, and small shifts at start or end of lines) a lot clearer in what they do.There's so the risk (but we already had it) that with bad luck with italic, bidi, hanging, we end up in
addLine()
with needing more or less witdh as when measured - hopefully, this can be compensated in the space chars width expansion or reduction.I had the thought that we can allow hanging punctuation in the main flow, but that we should not in table cells/inline-blocks/floats. crengine was already preventing it for table cells.
So, this would give this with hanging punctuation enabled:
In this, it would be ensured on the main text, but not in the float.
Should it be ensured or prevented on the main text part alongside the float ? We have enough padding here (in both the main flow and the float) so it doesn't hurt (and wouldn't hurt doing it in the float too)- but I dunno.
Or, when rendering a paragraph, should we go look as its containers to see how much room we have until one with some border (or to the top non-main-flow container) happens, and limit hanging to the available width ? (It could also look strange if none has any border, but some have some background color/image: some glyphs could overflow outside the colored background.)
Also, we have with this the opportunity to rework a bit the handling of glyph overflows at start or end of line. May be we should not do it only for italic?
When not hanging the main flow (because it's disabled, our default), should we ensure (like I did in this PR) that nothing hangs at all, not even
J
than can overflows on the left even when not italic ?Taking the sample I used in koreader/koreader#4577, with a BaskervaldX font that exhibits strange glyph overflows, and comparing:
KOReader current | this PR with hanging punctuaton disabled | this PR with hanging punct enabled
So, for the first non-italic J: should we let it hanging when hanging punctuation is disabled (as currently), to the risk of it leaking in some border (yellow border at bottom) - or should we try to avoid that generally so no text overflows its container (to the risk of possibly breaking nice layout if the font designer expects with this that stuff should hang a bit, no matter how the option we have is set?)
(Pinging @ptrm too as we might have discussed that in koreader/koreader#4577 - haven't re-read the whole thing :)
This change is