-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to improve em-dash linebreaks #365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly stylistic comments.
(Either more expressive tests, or more comments to say what/why/then :)
crengine/src/textlang.cpp
Outdated
{ | ||
lChar16 replacement = text[pos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you shift the whole section away on the right of the {
- it's hard to tell the }
that is closing your if ( next_usable == 0 ) {
is actually not closing this one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
crengine/src/textlang.cpp
Outdated
if ( pos == 0 ) replacement = '{'; | ||
if ( next_usable == 0 ) { | ||
replacement = (replacement == text[pos] ) ? '}' : '"'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the final decision for replacement ?
If yes, have them return here.
If not, make these 2 if
similar (both with {}
or both without).
Also, (replacement == text[pos] ) ?
is a bit unclear as the test. As it looks like this can only happen if pos!=0, so:
pos != 0 ? '}' : '"';
feels clearer to express why you replacce by this or that.
Or 3 ifs:
if ( pos == 0 && next_usable == 0 )
else if ( pos == 0 )
else if ( next_usable == 0 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not final, unless they both happen to be true, in which case the while
loops are empty.
I've reordered the loops, made the replacement logic more explicit and added some comments, hope it's better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better :) thank you.
crengine/src/textlang.cpp
Outdated
replacement = (replacement == text[pos] ) ? '}' : '"'; | ||
} | ||
while ( new_pos > 0) { | ||
new_pos--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
crengine/src/textlang.cpp
Outdated
} | ||
else if ( new_lbc == LBP_SP || new_pos == 0 ) { | ||
// found space or beginning | ||
replacement = (replacement == text[pos] ) ? '{' : '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a bit unclear here and below this (replacement == text[pos] ) ?
May be if you're using that test or pos != 0
often, define bool is_start
and bool is_end
, or bool is_replaced
early and use them instead.
Unless it's more complicated that what I manage to get ? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacement
could have been changed by pos == 0
or by while (new_pos > 0)
. Rather than defining is_replaced
, I check the relevant values explicitly. Is it clearer now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it is, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
We'll merge/bump this after the 2020.08 release (probably this weekend?) so we get a full month to validate it's fine.
@poire-z Probably. I mean, I would've done it last weekend but the temperature was extremely inconducive to doing pretty much anything at all… |
@virxkane : if you're picking this, you'll need a little patch (that you can find in koreader/koreader-base#1160 ) to apply on your libunibreak copy |
@poire-z Ok, thank you. |
Possible crash with this fixed by #368. |
This implements the crengine part of the changes discussed in #364, it requires exporting
lb_get_char_class
from libunibreak.Problem to solve By default line breaks are allowed at both sides of an em-dash. This causes problems when a dash is at a paragraph or quotation boundary, which may leave a dangling dash in one line. Additionally, in some languages dashes are not supposed to allow line breaks unless there are spaces.
Solution for easy languages Just replace the line breaking class of the em-dash with that of the "horizontal bar" (or quotation dash), i.e. a regular letter. This applies to Spanish, possibly French and other languages:
has_em_dash_alphabetic = true
.Solution for Engish The text has to be parsed and decide whether a particular dash should allow breaks or not. This is done by searching for a space or a letter/number on either side of the dash, ignoring other characters. If a space is found first, the dash should be deemed non-breaking, if letters are found on both sides first, the dash is the default breaking dash. Since it is currently not possible to examine the following text beyond the current text node, the actual solution is a bit more involved and not bullet-proof.
Before, some line breaks look ugly or or wrong (red), others are acceptable (green):
After, only acceptable line breaks:
This change is