-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multilingual support #5
Comments
multiple languages for document? |
Yes, I think Gitbook does support something like that. Instead of having the markdown files directly in the source folder you would have some sub folders like this:
And their would be an easy way to change the language in the rendered book. It's definitely something I would like to add, but it's not the highest priority at the moment |
Multiple designs possible:
|
I don't think one SUMMARY.md for everything is a good idea. I consider consistency within translated version more important than consistency with original. Otherwise, we can easily start having broken links because upstream renamed some chapter and translation didn't, yet. I believe a book that has no broken links is the minimum standard. Also, I don't support the idea of "pushing" to be up-to-date. AFAIK, translations (not only ours) are done by enthusiasts and it's not always possible to keep up at all times. Moreover, 1 to 1 mapping of pages doesn't look straightforward to me, even in case there's single SUMMARY. Words have different length in different languages, and in Russian translation we consistently have sentences that are noticeably longer than original. But I'd love to have it so that one click can show the same point in text in original language. I think this can be handled by tracking 1-to-1 mapping of paragraphs - sections aka markdown files are too big. Paragraphs also seem a good candidate because sentences get paraphrased and reordered sometimes, but the paragraphs stay in same order and have same gist. |
Thanks for the input! I really appreciate the feedback :)
When I am talking about 1 to 1 mapping I am talking about page to page mapping, not sentence to sentence (that would be insane 😉). Let's take a hypothetical situation with the Rust book. Let's say I am reading a blog post and it references some chapter in the Rust book, for example the chapter about ownership. But English is not my main language and it would be a lot easier to understand the chapter in my native language. If we have 1 to 1 mapping on page / chapter level the user could then select his language (if it is supported) from a dropdown menu and he would land on the exact same page in his chosen language. However for this to work correctly we need a guarantee that every page in one language has an equivalent page in the other language. If you allow a different
Of course, I totally agree with you. But the If there is one
To be honest, once a book has it's definitive structure the I think both designs have advantages and drawbacks, we need to figure out which one we want / need the most. Idea for Rust book workflow when translations are in treeWhen / if translations are moved into the official repository we could create a more elaborate pull request process. This is only an idea, it may be flawed 😉 When a pull request is made that contain changes that need translation (e.g. not typos) we could wait to merge the pull request until translations have been made for all officially supported languages. The pull request could track what translations have been made using a check list like this:
Once all the translations are ready the pull request is merged in. This would add a little / lot of overhead for the english version but it would solve the two big issues with translations.
There may be organizational problems I haven't considered though. @steveklabnik |
The biggest problem with blocking English changes to non-English changes is that I am paid for my work, but others are not. This places a big burden on them; I'm gonna want to land changes ASAP, and that's not fair to people who can't do this as a day job. |
That's true, didn't think of that. Anyways, do you have a preference for any of the two design choices (one vs. multiple |
I think I prefer a single for the reasons you've stated, but since I'm not doing the translations themselves, I don't think my opinions matter much :) And yeah, tracking might be different/better than actually blocking on them landing. |
Ok, I think what I was trying to say but couldn't get across is this: page-to-page mapping isn't enough for printed versions, as same pages will have different content. And if by page you meant a web page, that is not enough either. Some sections (pages) are tens of screens long, and to provide smooth transition from one version to another we should track smaller units than entire files (web pages). I originally thought you were talking about printed pages and written the following, but I'm not sure now. For printed versions, depending on length of the section and sentence-length difference with the original, this can very from "I see not the beginning of the paragraph that talks about Foo feature, but the end" to "I don't see the paragraph that talks about Foo feature on screen at all", when linked to "page 83 of PDF". So let's clarify the terms before continuing as apparently I misunderstood something 😄 |
Ok yes, I will try to do my best to explain what I envision: So in this issue I am not at all talking about tracking any changes for translations, only about how to support multiple languages in the same folder / book. Before I continue, let's explain what the When you render the book (
That is the "only" information we get from the If we want to support multiple languages for one book, there are two possible designs (that I thought off):
Let's see both in more details. One SUMMARY.md for all languagesConsider this # Summary
- [hello world](hello-world.md)
- [second chapter](second-chapter.md) and this directory structure:
As you can see here, every language has the same markdown files defined in the global AdvantagesHaving a guarantee that every chapter in one language has a corresponding chapter in another language gives us the possibility to change the language from any chapter and land on that same chapter in the other language. Example: I am reading the "borrowing" chapter of the Rust book. I want to see that same chapter in French. I just select "French" from the dropdown button in the menu-bar and I will land on the French version of the chapter. DrawbacksWhen the Problems that could occur:
Content is not modified by the Another drawback is that I am not sure yet how translations will give a translation for the chapter titles in the sidebar ( One SUMMARY.md for EVERY languageLet's consider this directory structure:
As you can see here, every language has it's own There is absolutely no more guarantee that the French version contains the same chapters as the English version. No 1 to 1 mapping. Essentially every language is its own separate book, they could have exactly the same structure or they could have totally different chapters. There is no way for the program to know that. It is thus impossible to change the language from a chapter. You would have navigate to the French version manually and search for the chapter you were reading if it exists in the French version at all! AdvantagesTranslations have a lot more freedom, but this can also be seen as a drawback. Translations do not need to have the same structure, so when the DrawbacksThere is no guarantee that a chapter in one language as an equivalent in another language.(No 1 to 1 mapping) The program can not know what chapters are equivalent in the different languages and it would thus be impossible to change the language from a chapter to land on the same chapter in the other language. I hope this made it more clear, if there is still something you don't understand I can elaborate more on some specific area. 😉 EDIT: A little quote from a response I made on Rust's internals forum:
You can already group the multiple translations in one directory as different books each with it's own |
Regarding Rust Book translation process, it is not disadvantages of some solution, but simply a fact. I think that the other projects that will use mdBook with multiple languages will have the same problem.
Can we make it simple and assume that the files with the same name in different languages are the same chapter? Then we can give the opportunity to switch to another language. I think this approach will satisfy both cases:
|
Also, I don't like the idea that when I read the book in Russian, I'll see TOC in English. I think we should not assume that the reader is familiar enough with the language of original to understand the chapter titles. |
How would you handle that? On some pages you can change the language and on others not? That would be really confusing for users I think.
Of course that was not the plan, I just hadn't found a good solution for it yet so I didn't discuss it too much |
Why not? We can clearly indicate that the translation for this chapter is not available yet. Another possible situation is that translation for some languages is available, but for other languages it's not. |
Another example that I care about. Let's compare the structure of the section "Getting started" in the nightly and stable books. As you can see, Steve joined 4 chapters into one. Imagine that not all the language versions supported this change yet. If we have common TOC, this means that there is no possibility to open "Installing Rust", "Hello World" and "Hello Cargo" chapters in non-English version of book, because they do not exist in the original TOC anymore. |
Yes I totally agree with you! This would be a big problem. However I am not sure I want to settle with the solution Gitbook proposes either. Maybe we can come up with something better that combines all the advantages and none of the drawbacks? (even if it's a little more complex) Gitbook uses the "one I think you could already achieve something very similar with mdBook with multiple books and configuring the source and output directories according to what you want. The only difference is that Gitbook makes it just a little bit easier to setup. |
My suggestion is to have "one SUMMARY.md per language", but support page-to-page cross-linking between the different languages. The easiest way to do this is to consider that the files with the same name are the same chapters. In 99% this should work. A more complex way to do this is to add some kind of identifier to each file (something like UUID). If the identifiers of the files are identical, we can cross-link them. |
Hmm yes that might be a good compromise. At least if the translations don't diverge to much from the original. I will try to think about this a little more and see if I can come up with other ideas. Thanks for the valuable input! :) |
FWIW, there are tools to handle translations which I didn't see mentioned here yet. For example, crowdin is used (or was when I was involved) over at freecad for document translation of their wiki. It was noteworthy that when an update was made to an english file, the plugin would notify you that the other translations need to be updated for that specific section or they would be out of date. The page linked above actually lists how complete each language translation is and maintains that information. It is possible a tool like crowdin could just be added to the build process as a plugin which has been notified of which files require translating. Then it will maintain the database itself somewhere and you could tell mdbook where the translated files are located. A solution like this seems worth the time exploring before spending effort creating a new ground up approach to solve the same problem. EDIT: Also note they offer free support to open source projects |
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes #115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: rust-lang/mdBook#1864 [3]: rust-lang/mdBook#5 (comment) [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes #115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: rust-lang/mdBook#1864 [3]: rust-lang/mdBook#5 (comment) [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes #115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: rust-lang/mdBook#1864 [3]: rust-lang/mdBook#5 (comment) [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes #115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: rust-lang/mdBook#1864 [3]: rust-lang/mdBook#5 (comment) [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
Hi all, I've published the plugins for a Gettext i18n translation workflow as a separate crate! You can install it with the usual cargo install mdbook-i18n-helpers Please see https://crates.io/crates/mdbook-i18n-helpers and let me know what you think in https://github.com/google/mdbook-i18n-helpers. We've been using this infrastructure for 4 months now in the Comprehensive Rust 🦀 project. People have translated the course into Korean and Brazilian Portuguese and we have a few more languages in the pipeline. What I like about this approach is that it's a very classic approach — Gettext is more than 30 years old now and there are a lot of tools out there which can help translators wrangle the |
This is indeed a good idea, but unfortunately, every time switch languages need to reload some styles, and maybe mdbook needs to make some changes for this to get a better experience. |
Are you talking about how the different languages are completely independent books (with their own assets such as stylesheets, images, etc)? I agree that it's a bit unfortunate.
Yes, it could certainly be made easier! One pain point right now is that I need to copy the |
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes google#115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: rust-lang/mdBook#1864 [3]: rust-lang/mdBook#5 (comment) [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
Hi again, just wanted to let people here know that I've released a version 0.2 of mdbook-i18n-helpers. This version changes how the text is extracted: paragraph are now unwrapped, headings are stripped of A normalization tool is included to help you convert old translation files to the new format — we have ~18 translations now for Comprehensive Rust, so it's important for us to have a migration path for those files. I would be very interested in feedback if you try it out! Thanks 🙂 |
As I haven't really made a lot of progress on this front, besides setting up the template. It makes more sense I guess to reinitialize the whole .po/.pot files with the new version. I'll do that when I can spend time on it. 👍🏽 Thanks for keeping us up-to-date! <3 |
I know you might want to use Rust here but at KDE we have written and used a Python program to do i18n with gettext for our Hugo websites for some years. Recently I have separated the Markdown stuff from the Hugo-specific stuff, and so if you want to do i18n and l10n for individual Markdown files, markdown-gettext might be helpful for you. It is compliant with CommonMark, and has support for all core Markdown elements, as well as YAML front matter, table, and definition list. The support here means that only text is processed (i12ized/localized), all formatting characters (at block level) are ignored during i18n but the file structure will be the same after l10n. I understand this package might not be 100% fit with mdBook; however, writing an extension for the lib behind it is not difficult. I hope by using the package, you won't have to recreate the processing of common Markdown elements, and can focus on the differences. |
Hi @PhuNH,
That sounds nice and it sounds similar to the processing done by mdbook-i18n-helpers. A mdbook preprocessor can be written in any language — the manual has a Python example. It's probably very easy to create a wrapper around your library. I recently found another tool for translating Markdown: https://github.com/mondeja/mdpo, also written in Python. There is also https://po4a.org/index.php.en, which handles even more formats. |
commit 8664faea083017b1ec7c9d811be28427b8408bef Author: Kara <[email protected]> Date: Thu Sep 28 10:47:37 2023 -0500 Update for new mdbook version commit 1b45e7a7a6521b4df6d441788a7fff105eba9240 Merge: e74fdb1 79edc75 Author: Kara <[email protected]> Date: Thu Sep 28 10:03:55 2023 -0500 Merge branch 'master' into localization # Conflicts: # Cargo.lock # Cargo.toml # src/book/book.rs # src/book/init.rs # src/book/mod.rs # src/cmd/build.rs # src/cmd/clean.rs # src/cmd/serve.rs # src/cmd/test.rs # src/cmd/watch.rs # src/config.rs # src/preprocess/links.rs # src/renderer/html_handlebars/hbs_renderer.rs # src/renderer/markdown_renderer.rs # src/utils/mod.rs # tests/init.rs commit e74fdb1 Author: Ruin0x11 <[email protected]> Date: Fri Feb 25 14:30:38 2022 -0800 Make `chapter_titles` optional in Book commit 7305e8c Merge: 9d8147c 5921f59 Author: Ruin0x11 <[email protected]> Date: Fri Feb 25 14:13:22 2022 -0800 Merge remote-tracking branch 'upstream/master' into localization # Conflicts: # .gitignore # guide/src/en/cli/completions.md # guide/src/en/format/images/rust-logo-blk.svg # guide/src/en/format/markdown.md # guide/src/en/misc/introduction.md # src/renderer/html_handlebars/hbs_renderer.rs # src/utils/mod.rs commit 9d8147c Author: Ruin0x11 <[email protected]> Date: Wed Sep 15 21:49:58 2021 -0700 Remove extra `localization.md` commit 56e72a2 Author: Ruin0x11 <[email protected]> Date: Wed Sep 15 15:33:28 2021 -0700 [localization] rustfmt commit 92ec3dd Author: Ruin0x11 <[email protected]> Date: Wed Sep 15 15:25:31 2021 -0700 [localization] Fixes for latest master commit d6c27ab Author: Ruin0x11 <[email protected]> Date: Sat Aug 29 16:11:47 2020 -0700 Implement translation fallback of files included with preprocessing commit 5fed5e8 Author: Ruin0x11 <[email protected]> Date: Wed Sep 15 14:29:30 2021 -0700 Update mdBook manual to have information about translations commit 09a8b66 Author: Ruin0x11 <[email protected]> Date: Sat Aug 29 14:41:08 2020 -0700 Improve robustness of link rewriting commit 8d1c086 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 16:33:02 2020 -0700 Fix {{#include}} directives for default language commit 98c3a04 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 16:11:21 2020 -0700 Move example book to multilingual structure commit c72ce18 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 14:50:04 2020 -0700 Rewrite links in Markdown to point to fallback if missing in translation It will follow relative links to other pages and embedded images. commit ee740ac Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 12:26:08 2020 -0700 Remove 'default' property on languages, use book.language instead commit a042cfc Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 11:35:42 2020 -0700 Make `mdbook init` output multilingual structure commit 5e223e0 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 03:17:26 2020 -0700 Support localizing book title/description commit e17ce64 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 02:29:07 2020 -0700 Fix test using create_missing commit 282fdaa Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 02:05:21 2020 -0700 Redirect to a 404 page when serving translated We can't redirect in warp based on the URL, so redirect to the default language's 404 page instead. See: seanmonstar/warp#171 commit 85ab4d3 Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 01:36:22 2020 -0700 Redirect to translation index page in serve command commit 8869c2c Author: Ruin0x11 <[email protected]> Date: Fri Aug 28 00:24:33 2020 -0700 Build multiple books from localizations at once Changes how the `book` module loads books. Now it is possible to load all of the translations of a book and put them into a single output folder. If a book is generated this way, a menu will be created in the handlebars renderer for switching between languages. commit 96d9271 Author: Ruin0x11 <[email protected]> Date: Thu Aug 27 19:44:24 2020 -0700 Specify language for book in command line args - Add a [language] table to book.toml. Each key in the table defines a new language with `name` and `default` properties. - Changes the directory structure of localized books. If the [language] table exists, mdBook will now assume the src/ directory contains subdirectories named after the keys in [language]. The behavior is backwards-compatible if you don't specify [language]. - Specify which language of book to build using the -l/--language argument to `mdbook build` and similar, or omit to use the default language. - Specify the default language by setting the `default` property to `true` in an entry in [language]. Exactly one language must have `default` set to `true` if the [language] table is defined. - Each language has its own SUMMARY.md. It can include links to files not in other translations. If a link in SUMMARY.md refers to a nonexistent file that is specified in the default language, the renderer will gracefully degrade the link to the default language's page. If it still doesn't exist, the config's `create_missing` option will be respected instead. commit 3049d9f Author: Ruin0x11 <[email protected]> Date: Thu Aug 27 16:35:00 2020 -0700 Actually, don't change source root The book paths have to gracefully degrade to the default language if they aren't available. commit 24e6d6b Author: Ruin0x11 <[email protected]> Date: Thu Aug 27 16:26:07 2020 -0700 Change book source root depending on language commit e4b443c Author: Ruin0x11 <[email protected]> Date: Thu Aug 27 13:27:47 2020 -0700 Add language config section Referencing rust-lang#5 (comment).
IMO it would be a real value add if mdbook would have the support provided by https://github.com/google/mdbook-i18n-helpers/ out of the box. What was the consensious about adding this feature to mdbook directly? |
@sassman I totally agree. I wanted but haven't come around (yet), to add multi-language support to rust design patterns repository, because I think it's another system to get into and requires changes to the standard mdbook book building process, that is more effort to maintain and more likely to break, due to lack of official support in mdbook and testing against this. So adding this to mdbook would be really welcome, also to take off the maintenance burden for this essential (for documentation, IMHO) feature set. P.S.: This is in no way a critique on what @mgeisler is doing for awesome work (props!) with the |
@simonsan if project leadership is actually fine with the complexity added, I'm sure we find a way to contribute this back to mdbook. For me now it's just not clear if such a change is even welcomed. This very issue here does not speak a clear message, I cannot really distill an actual way forward. I mean creating a PR would require an effort, that should not be in vain, right? |
Add support for multiple languages.
The text was updated successfully, but these errors were encountered: