Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to improve em-dash linebreaks #365

Merged
merged 1 commit into from
Aug 17, 2020
Merged

Try to improve em-dash linebreaks #365

merged 1 commit into from
Aug 17, 2020

Conversation

Jellby
Copy link
Contributor

@Jellby Jellby commented Aug 12, 2020

This implements the crengine part of the changes discussed in #364, it requires exporting lb_get_char_class from libunibreak.

Problem to solve By default line breaks are allowed at both sides of an em-dash. This causes problems when a dash is at a paragraph or quotation boundary, which may leave a dangling dash in one line. Additionally, in some languages dashes are not supposed to allow line breaks unless there are spaces.

Solution for easy languages Just replace the line breaking class of the em-dash with that of the "horizontal bar" (or quotation dash), i.e. a regular letter. This applies to Spanish, possibly French and other languages: has_em_dash_alphabetic = true.

Solution for Engish The text has to be parsed and decide whether a particular dash should allow breaks or not. This is done by searching for a space or a letter/number on either side of the dash, ignoring other characters. If a space is found first, the dash should be deemed non-breaking, if letters are found on both sides first, the dash is the default breaking dash. Since it is currently not possible to examine the following text beyond the current text node, the actual solution is a bit more involved and not bullet-proof.

Before, some line breaks look ugly or or wrong (red), others are acceptable (green):
before

After, only acceptable line breaks:
after


This change is Reviewable

Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly stylistic comments.
(Either more expressive tests, or more comments to say what/why/then :)

Comment on lines 458 to 459
{
lChar16 replacement = text[pos];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you shift the whole section away on the right of the { - it's hard to tell the } that is closing your if ( next_usable == 0 ) { is actually not closing this one :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 463 to 466
if ( pos == 0 ) replacement = '{';
if ( next_usable == 0 ) {
replacement = (replacement == text[pos] ) ? '}' : '"';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the final decision for replacement ?
If yes, have them return here.
If not, make these 2 if similar (both with {} or both without).
Also, (replacement == text[pos] ) ? is a bit unclear as the test. As it looks like this can only happen if pos!=0, so:
pos != 0 ? '}' : '"'; feels clearer to express why you replacce by this or that.
Or 3 ifs:

if ( pos == 0 && next_usable == 0  )
else if ( pos == 0 )
else if ( next_usable == 0  )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not final, unless they both happen to be true, in which case the while loops are empty.

I've reordered the loops, made the replacement logic more explicit and added some comments, hope it's better now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better :) thank you.

replacement = (replacement == text[pos] ) ? '}' : '"';
}
while ( new_pos > 0) {
new_pos--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
else if ( new_lbc == LBP_SP || new_pos == 0 ) {
// found space or beginning
replacement = (replacement == text[pos] ) ? '{' : '"';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit unclear here and below this (replacement == text[pos] ) ?
May be if you're using that test or pos != 0 often, define bool is_start and bool is_end , or bool is_replaced early and use them instead.
Unless it's more complicated that what I manage to get ? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacement could have been changed by pos == 0 or by while (new_pos > 0). Rather than defining is_replaced, I check the relevant values explicitly. Is it clearer now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it is, thanks.

Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
We'll merge/bump this after the 2020.08 release (probably this weekend?) so we get a full month to validate it's fine.

@Frenzie
Copy link
Member

Frenzie commented Aug 13, 2020

@poire-z Probably. I mean, I would've done it last weekend but the temperature was extremely inconducive to doing pretty much anything at all…

@poire-z
Copy link
Contributor

poire-z commented Aug 17, 2020

@virxkane : if you're picking this, you'll need a little patch (that you can find in koreader/koreader-base#1160 ) to apply on your libunibreak copy

@virxkane
Copy link
Contributor

@poire-z Ok, thank you.

@poire-z
Copy link
Contributor

poire-z commented Aug 17, 2020

Possible crash with this fixed by #368.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants