Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalization #26

Closed
JayPanoz opened this issue Dec 12, 2017 · 21 comments
Closed

Internationalization #26

JayPanoz opened this issue Dec 12, 2017 · 21 comments

Comments

@JayPanoz
Copy link
Collaborator

JayPanoz commented Dec 12, 2017

Note: I’ll use “i18n” instead of “internationalization”.

So let’s be honest, this issue will quite probably stand as “The Readium CSS issue” since it is roadmap-blocking, is impacting other parts of Readium 2 (streamer, navigator, API, apps developed by EDRLab), and will need a lot of documentation. In other words, it’s a project on its own, nested in the Readium CSS project.

I’ve spent the last 3 weeks documenting this issue, and you can think of it as a summary of the research that has been done. I won’t list everything there but only what is critical to provide implementers and readers with a solid baseline. I’m willing to make this baseline the best we can get (say, bulletproof although rough around the edges) but it’s worth noting we’ll need help from experts in those various and diverse languages and their typography to turn it into an excellent user experience.

Roadmap

First and foremost, I’d be in favor of updating our current roadmap. Vertical writing is indeed blocking and I’d like to move it to the beta version.

What it means is that we could ship support for horizontal-tb CJK, Right to Left and Indic languages in the alpha version relatively quickly, and then focus on vertical-writing, since our work on a11y baseline has been ahead of time since the beginning of this project.

I believe we would all probably agree that the prototype has proved a solid-enough bedrock – of course we have edge cases to deal with but it’s fine for the vast majority of contents – and pushing the small and easy wins for RTL/horizontal-tb CJK/Indic on the develop branch would allow us to release an alpha on the master branch early 2018. This would probably send a good signal too since the proto has been released 3 months ago already.

On a related note, we’ll start documenting columns handling (e.g. page progression) in January 2018 so it would make sense to prioritize LTR/RTL (horizontal-tb), especially as we’ll be able to document vertical-writing immediately after – quite frankly, this will be critical since they are conceptual changes to take into account.

Global needs outscoping Readium CSS

Obviously, Readium CSS won’t be able to fly in autopilot mode there. It needs either flags it can target or smart handling of its resources depending on the publication.

Minimal set of features

What we’ll need:

  • checking the page-progression-direction for the spine (streamer);
  • checking the language, and they can be multiple <meta> (streamer);
  • loading specific stylesheets based on those previous indications (API);
  • appending xml:lang and/or lang attribute if it’s missing in XHTML documents (API);
  • appending dir="rtl" attribute if it’s missing in XHTML documents (API);
  • loading specific fonts’ lists for user settings, based on language (Apps);
  • adding/removing specific user settings, based on language (Apps);
  • having the toc and at least some pieces of user settings (e.g. text-align) with a rtl direction for RTL languages (Apps);
  • page-progression from right to left (navigator).

A longer-term issue will be localization, should you want to get this need covered in the apps, as implementers might want an easy way to translate strings, etc. But it’s up to EDRLab, obviously.

Writing-mode and RTL mapping

For writing mode, those are the writing-mode we should apply based on the language and page-progression-direction:

Language IANA tag page-progression-direction Writing-mode
Chinese zh LTR / Default / None horizontal-tb
Chinese zh RTL vertical-rl
Chinese (Simplified) zh-Hans DNA (?) horizontal-tb
Chinese (Traditional) zh-Hant DNA (?) vertical-rl
Chinese (Taiwan) zh-TW LTR / Default / None horizontal-tb
Chinese (Taiwan) zh-TW RTL vertical-rl
Chinese (Hong Kong) zh-HK LTR / Default / None horizontal-tb
Chinese (Hong Kong) zh-HK RTL vertical-rl
Hangul ko LTR / Default / None horizontal-tb
Hangul ko RTL vertical-rl
Japanese ja LTR / Default / None horizontal-tb
Japanese ja RTL vertical-rl
Mongolian mn-Cyrl LTR / Default / None horizontal-tb
Mongolian mn-Mong DNA vertical-lr

I propose we simplify this model for Chinese and rely on page-progression-direction with an extra check for language (zh), and not bother with all those variants.

It’s worth noting we should not add dir="rtl" there, for the CJK languages.

In Right to left, we can simply rely on page-progression-direction, if the language is not CJK (and Mongolian) but here is a mapping of languages you might encounter, just for your information:

Language IANA tag page-progression-direction dir attribute
Arabic ar RTL rtl
Farsi (Persian) fa RTL rtl
Hebrew he RTL rtl

Right to Left

This shouldn’t be a huge issue in Readium CSS, as we only need a few adjustments, specific base and default styles, and typefaces.

Hopefully, this doesn’t impact our views (paged and scrolled) since columns will behave as expected.

Our pagination model is the following:

 _________________    _________________
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|      Col 1      |  |      Col 2      |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
 —————————————————    ————————————————— 

CSS Multicol in horizontal-tb (x-axis)

When the dir attribute is set on html, it becomes:

 _________________    _________________
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|      Col 2      |  |      Col 1      |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
 —————————————————    ————————————————— 

CSS Multicol in horizontal-tb + dir="rtl" (x-axis)

Our main CSS-related concern there should be typefaces, as we’ll need outstanding fonts to deal with typography requirements (ligatures, multi-baseline levels, joining rules, etc.).

CJK (horizontal-tb)

Similar to RTL: we only need a few adjustments, specific base and default styles, and typefaces.

This should already provide support for the vast majority of contents in Chinese (vertical-writing is not used in mainland China, but only in Taiwan, Hong Kong and Macao), and Korean.

Chinese, Japanese and Hangul share a lot in terms of typography but having a few adjustments for each language would be a plus since differences are quite minor.

Other languages

For the time being, we’re only focusing on Devanagari, which should not have a huge impact. Once again, we’ll need a few adjustments, with the main focus being typefaces.

Vertical Writing

This is by very far our biggest issue in Readium CSS since we can’t necessarily manage that well, cross-platform-wise.

We don’t have anything to force the column-axis in CSS, which means that our spread model (two columns next to each other)

 _________________    _________________
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|      Col 1      |  |      Col 2      |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
|                 |  |                 |
 —————————————————    ————————————————— 

CSS Multicol in horizontal-tb (x-axis)

Will automatically become the following in vertical-rl:

 _____________________________________
|                                     |
|                                     |
|                Col 1                |
|                                     |
|                                     |
 —————————————————————————————————————
 _____________________________________
|                                     |
|                                     |
|                Col 2                |
|                                     |
|                                     |
 ————————————————————————————————————— 

CSS Multicol in vertical-* (y-axis)

So the best we can do right now is a fragmented scrolled-view:

  _____________________________________
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                Col 1                |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
  ————————————————————————————————————— 
- - - - - - - - - - - - - - - - - - - - - (Overflow begins here)
  _____________________________________
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |              Overflowed             |
 |                 Col                 |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
 |                                     |
  ————————————————————————————————————— 

New fragmented scrolled-view for vertical-* (y-axis)

In other words, one column with overflowed columns on the y-axis, which 1) will force implementers to map left/right (swipe/buttons) on bottom/top and 2) won’t allow them to have page-transition animations.

Note: The only alternative to solve those issues at the moment would be writing a renderer in JavaScript. It’s worth noting that if you’re only targeting iOS, there is a solution in pure CSS though.

What’s even worse is that the same typefaces can’t necessarily be used (proportional/fixed-width depending on writing-mode), and I’ll have to make adjustments for quotes and other details in the base and default stylesheets based on writing-mode…

Note: We won’t try to manage horizontal-tb documents in vertical-rl publications in a smart way for the time being. This use case is indeed not defined in the EPUB spec. Besides, we’ve got nothing at the OPF level to deal with it, and checking the writing-mode during runtime will blow performance in extreme ways i.e. 15 seconds to render some XHTML files… which would be worse than supporting this use case in terms of UX.

Longer terms issues include:

  • polyfilling -epub-properties for web apps;
  • support for alternate stylesheets (which is critical if the implementer wants to offer a horizontal/vertical-writing user setting);
  • support for rendition: align-x-center;
  • support for ibooks:respect-image-size-class (gaiji) and ibooks:scroll-axis metas (see EPUB Compat doc);
  • user settings (some like letter- and word-spacing might have to be removed, and not only for CJK);
  • rendition: flow of scrolled-doc.

Out of scope

There are some typography and layout issues which are not our responsibility but rendering engines’. Those issues include:

  • line-adjustment and justification (RTL and CJK);
  • run-in headings (display: run-in), which is popular in CJK;
  • ruby and its styling;
  • bidi;
  • Kashida Elongation (Arabic);
  • joining forms (Arabic);
  • single-letter styling (Arabic).

Documentation

In theory, I would only have to document the new fragmented scrolled-view model for vertical-writing, and adjustments for user settings.

In practice, I’m willing to go the extra mile and will document typographic and layout concepts, and make glossaries, so that Western implementers have everything at hand to deal with requirements and issues in CJK and languages they might not be familiar with.

This will obviously take time but will fix a huge pain point.

Overarching Issues

  • We don’t have any text layout requirement for hebrew, we’ll need help
  • Hangul, Chinese, Arabic and Indic Text Layout requirements are incomplete, we’ll have to keep track of updates and changes
  • We have not defined a precise baseline for i18n, it may be wise to do it
  • It would be useful to have data about writing-mode, RTL, typefaces used/expected, etc.

Resources

@r12a
Copy link

r12a commented Dec 13, 2017

I recommend using BCP47 language tags for language identification (per HTML standard). That would allow you for example to refer to az-Latn vs az-Arab vs az-Cyrl above. For more info about BCP47 language tags see https://www.w3.org/International/articles/language-tags/

@r12a
Copy link

r12a commented Dec 13, 2017

Javanese^3 | jv | RTL | rtl

I'd have thought that Javanese written in the Javanese script (ltr) was more likely to be encountered than written in the arabic script. (See https://r12a.github.io/scripts/javanese/ for info about the Javanese script.)

@r12a
Copy link

r12a commented Dec 13, 2017

Forgive my ignorance, but are you using XHTML only? HTML5 has a number of important additions for support of bidirectional text, introducing control for directional isolation and auto-detection of base direction for injected text. For more information see https://www.w3.org/International/articles/inline-bidi-markup/ (updated just now). Hope these comments are helpful.

@JayPanoz
Copy link
Collaborator Author

@r12a Oh yeah that’s super useful, thank you!

To answer your questions:

  1. your first comment made me second check the EPUB specs for the language metadata of publication, and it turns out the value must be conforming to rfc5646 so I can’t even tell why I listed ISO codes in the first place. So thank you for pointing that out.

  2. I’ve indeed had doubts about Javanese when checking multiple references. To sum things up, I took a look at the Script direction and languages doc and may have misunderstood the footnote, then I checked others docs in which LTR was mentioned so thank you for clarifying this.

  3. In EPUB 3, documents must be HTML5 documents that conform to the XHTML syntax. Does this imply we will benefit from those additions? There indeed were talks about having HTML as a core media type but it was rejected for EPUB 3.1.

@JayPanoz
Copy link
Collaborator Author

JayPanoz commented Dec 23, 2017

We’ll have an issue with Arabic in EPUB 2 though.

Yesterday Kevin Callahan triggered what appeared to be a bug in iBooks at first sight. Except it wasn’t.

To sum things up, the EPUB opened backwards (as if page-progression-direction="rtl" were set on the spine). Turns out the app used the last language element it found in the metadata, which was ar-SA, to render that.

Which reminded me EPUB 2 doesn’t have this page-progression-direction attribute, but supports anything else needed for Right-to-Left (including the dir attribute and the direction CSS property). I can indeed remember EPUB 2 publications in Arabic (at some point, a prospect asked if I could do that and after checking who could do that, I discovered services offering EPUB2 output).

What it means:

  1. there are major Reading Systems supporting RTL in EPUB 2;
  2. content providers have been probably using this version because it just works;
  3. the only hint we get is the language then;
  4. we don’t know what’s the main language is when multiple elements are declared;
  5. I couldn’t find any guidance on multiple elements handling in the specs (yet);
  6. if there is no guidance, it explains the huge interoperability issues† authors have to deal with.

How should we handle this case then?

† Footnote
  • Authors putting all the authors in the same meta element or else only one will be displayed in the running header.
  • Nobody agreeing whether you should have one element per keyword or all keywords in the same subject element (separated with commas or semi-colons).
  • Multiple languages, for which you don’t know how to order them since some RS may take the first one they can parse, and others will take the last one – and I’ve just discovered Kindle imposes authors to use the main one only, at least in some cases, or their KDP validator will throw an error.

@JayPanoz
Copy link
Collaborator Author

JayPanoz commented Dec 23, 2017

After further testing, this is how some major Reading Systems handle this particular case (with 2 <dc:language> declarations):

Reading System EPUB version Trusted language page-progression-direction rendition User Settings
iBooks 2.0.1 last value No language value language value
Play Books 2.0.1 none No LTR English / DNA
Kobo Mac 2.0.1 none No LTR DNA
Readium 2.0.1 none No LTR DNA
iBooks 3.0.1 last value No language value language value
Play Books 3.0.1 none No LTR English / DNA
Kobo Mac 3.0.1 none No LTR DNA
Readium 3.0.1 none No LTR DNA
iBooks 3.0.1 last value Yes page-progression-direction language value
Play Books 3.0.1 none Yes page-progression-direction English / DNA
Kobo Mac 3.0.1 none Yes page-progression-direction DNA
Readium 3.0.1 none Yes page-progression-direction DNA

Methodology: 6 files were tested – 2 EPUB2 files, 4 EPUB3 files. The only differences were:

  • order of <dc:language>
  • page-progression-direction attribute on <spine>.

For instance,

<metadata>
  <dc:language>he</dc:language>
  <dc:language>en</dc:language>
</metadata>
…
<spine>
  …
</spine>

and

<metadata>
  <dc:language>en</dc:language>
  <dc:language>he</dc:language>
</metadata>
…
<spine page-progression-direction="rtl">
  …
</spine>

Note: per their guidelines, Kobo is allowing authors to use page-progression-direction in ePub 2, and advise them to ignore the ePubCheck error it might arise.

@BNGOBooks
Copy link

BNGOBooks commented Dec 23, 2017 via email

@JayPanoz
Copy link
Collaborator Author

JayPanoz commented Dec 23, 2017

Ah thanks for the clarification that I should have made myself in the previous message.

So yeah, I can confirm that their strategy is trusting the last <dc:language> it can get (if there is no page-progression-direction set for the spine). So the EPUB version doesn’t even make any difference in that case.

@JayPanoz
Copy link
Collaborator Author

Samples are now available to test and improve i18n support.

They are available in docs.

@JayPanoz
Copy link
Collaborator Author

JayPanoz commented Feb 1, 2018

Took a quick look at the iBooks doc to check some details related to rendition and oh boy do they put the burden on authors.

  1. Language:
    • it should be declared twice, in the OPF and iTunes Metadata, and values must match – I guess it helps them sanitize the OPF when several <dc:language> can be found since authoring software can output multiple values without the author even knowing;
    • for Chinese language books, you must specify both the language (zh) and the script (Hans or Hant) portions of the language code.
  2. Text direction:
    • each content document can support a single writing-mode value. If you want both horizontal and vertical text in your book, then each text direction must be split into separate content documents;
    • if you want to use a Japanese font that is available in macOS or iOS, it is strongly recommended that you use Hiragino Kaku ProN and Hiragino Mincho ProN as both are pre-installed for the reader;
    • if the Table of Contents in the iBooks menu needs to be rendered vertically as opposed to horizontally, the text direction must be specified for the TOC in the Navigation document.
  3. Scroll:
    • by default, Japanese and Chinese books scroll horizontally, while all other languages scroll vertically;
    • to redefine the scroll direction, the book must include the following metadata in the .opf file: ibooks:scroll-axis. Possible values are vertical, horizontal, and default.

What worries me lotbits is that this is consistent with the issues we quickly discovered when starting work on vertical writing in the EPUB context (paginated, page-progression, mixed writing-modes, etc.) and they put constraints we can’t necessarily.

It also reminds me the Japanese industry is pushing for the EPUB 3.2 revision while the current spec is barely vertical-writing-friendly. I mean, I’m pretty sure lots of contents relying on the -epub-writing-mode CSS property, which hangs by an alias Webkit deigned to implement for iBooks back in the early EPUB3 days.

The constraints Apple has designed, they’re just here to fill the spec gaps – but we’ll have to design heuristics because we simply can’t do that. cc @llemeurfr

@JayPanoz
Copy link
Collaborator Author

JayPanoz commented Feb 3, 2018

Added control captures for i18n samples. They are all available in the i18n folder.

I can’t tell whether we have any issue for text rendering or not. So any review would be greatly appreciated since I can tell for sure that only Mongolian doesn’t render as expected (we have no system font to manage that).

Tests those capture cover:

  • pagination;
  • specific font-stack for each language (base module);
  • direction (lrt/rtl);
  • writing mode (horizontal/vertical);
  • default for unstyled ebooks.

@llemeurfr
Copy link
Contributor

llemeurfr commented Feb 21, 2018

@JayPanoz, in the scope of the EPUB 3.2 revision, can you define what would be useful for handling vertical writing perfectly (or at least better)?

@JayPanoz
Copy link
Collaborator Author

Yeah no problem.

1/ I’ve already raised an issue about multiple metadata elements and the lack of guidance a.k.a. “which one should be considered the primary language?” on the EPUB-revision issue (<dc:language>) as it is part of this more global issue: w3c/epub-specs#992

In practice, vendors already have to deal with this, and may impose constraints to authors, e.g. only one language item, specifying the Han Traditional or Han Simplified script for Chinese, using another prefixed meta to force direction of scroll, etc.

But that’s only because there is no better way to know we should use vertical-writing from the OPF…

2/ We quickly discussed something like page-progression-direction but for vertical writing in the OPF with @HadrienGardeur during our meeting with Florian. But I’m not sure people will warmly welcome yet another rendering meta.

The thing is it’s unrealistic to use the writing-mode computed style at runtime, as it imposes a huge burden both on authors (time to rendering is abysmal, like 5–10 times more than when written horizontally) and users (UX).

3/ Then there are the mixed directions and writing-modes issues i.e. when the dir or writing-mode of a whole document/XHTML file differs from the one we can get from page-progression-direction.

A warning/note that all reading systems can’t necessarily manage those cases would be the least the spec can do, and authors must be extra cautious about that. It currently isn’t discussed at all, and the process just started for Web Publications – but impacts EPUB3 as well.

@JayPanoz
Copy link
Collaborator Author

Oh and yeah, a super important one.

4/ Authors shouldn’t rely on -epub-properties and as guidance, should use all the needed prefixed properties (-ms-, -webkit-, -moz-) + the standard whenever needed. We have an increasingly corpus of files which don’t, and it builds a huge debt for everyone.

@SeldomScene
Copy link

Hi Jiminy,
Just for your info, be careful with prefixed properties, as Google asked us to remove every prefixed properties except those used by them (i.e. -webkit- ), because their programs didn't ingest correctly our files (it's new).
So we have to remove any -ms- and -moz- existing in our files...
Vincent
EUB3 ISIcrunch platform for education

@JayPanoz
Copy link
Collaborator Author

Well, thanks for the info.

If that’s Google Play Books, they’re consequently creating an interoperability issue – and we already have an awful lot because vendors decide to do such stuff unilaterally. It also breaks yet another fundamental rule of CSS, and change the way it is working so this is bad. This should be raised at a higher level, because it impacts the whole ecosystem and, more importantly, competitors.

How are they parsing CSS in the first place? You can’t encounter this issue unless you create it and are extremely lazy – in the worst case scenario, stripping those declarations is a 2-line script…

@JayPanoz
Copy link
Collaborator Author

Opened a specific issue (#32) about prefixed properties as we’d better keep this one for i18n only.

@JayPanoz
Copy link
Collaborator Author

Oh joy… Kindle being Kindle all over again:

If the page propagation direction is not left-to-right, page propagation direction should be provided either in the metadata or the spine. (Example: <meta name="primary-writing-mode" content="horizontal-rl"/>)

  1. it’s interesting they called that primary-writing-mode (emphasis on primary)
  2. horizontal-rl doesn’t exist in the CSS spec
  3. Why offering an alternative to <spine page-progression-direction="rtl"> in the first place? (and yeah, possible future interop issue…)

@llemeurfr
Copy link
Contributor

Just a question; why would we need to support this Kindle thing as your point 3. is the standard way to act? shouldn't we recommend authors to use the standard way plus alert them that they'll need to add the Amazon meta if they also target Kindle?

@JayPanoz
Copy link
Collaborator Author

To clarify, I would not necessarily worry about that in the short term. If you ask me “do we need to support that?”, my answer would be “No, not at the moment.”

Now, [company we shouldn’t name] happens to finally care about internationalization after a decade of not caring at all. And if you had to support right to left scripts in the easiest way possible, a way that’s EPUB2-compatible and won’t raise an ePubCheck issue, a way that doesn’t require extra namespaces and doesn’t disrupt workflows, etc., guess what you’ll end up with? Yes, a <meta> in the .opf.

In other words, in the longer term, my answer could become “yes, we need to support that, because usage is significant.” But that’s relatively new, so we should probably keep in on our radar and that’s it.

@llemeurfr
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants