Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build translations from mdbook #84

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions i18n-helpers/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ pulldown-cmark = { version = "0.9.2", default-features = false }
pulldown-cmark-to-cmark = "11.0.0"
regex = "1.9.4"
semver = "1.0.16"
serde = "1.0.130"
serde_json = "1.0.91"

[dev-dependencies]
Expand Down
107 changes: 107 additions & 0 deletions i18n-helpers/src/bin/mdbook-i18n.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
use mdbook::renderer::RenderContext;
use serde::Deserialize;
use std::collections::BTreeMap;
use std::io;

/// Parameters for the i18n renderer.
///
/// They are set in the `output.i18n` section of the book's `book.toml` file.
#[derive(Deserialize)]
struct I18nConfiguration {
/// A map of language codes to language names.
///
/// ## Example
///
/// ```toml
/// [output.i18n.languages]
/// "en" = "English"
/// "es" = "Spanish (Español)"
/// "ko" = "Korean (한국어)"
/// "pt-BR" = "Brazilian Portuguese (Português do Brasil)"
/// ```
#[serde(default)]
languages: BTreeMap<String, String>,
/// Default language code. It will not be translated.
default_language: Option<String>,

/// Whether to translate all languages or just the selected language, defaults to false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "selected language"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language set in the env variable MDBOOK_BOOK__LANGUAGE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay... Setting MDBOOK_BOOK__LANGUAGE is just another way of setting the language field in book.toml. So it's not strictly "selected", it's just what the config happens to be when the mdbook-gettext preprocessor is invoked.

The current system has the elegant approach that you can set language to what you want and it will then translate the text on the fly to this language — because it's a preprocessor which is independent of the renderer!

The new renderer is a bit different: it takes a list of languages from book.toml and renders the book for each of them. If the user only want to render a single language, the simplest way would be for the user to remove the other languages from the list in book.toml!

The users could do this on the command line with a suitable MDBOOK_OUTPUT__I18n__LANGUAGES setting (I think). This would be better since it uses less infrastructure: all you need to know is that the output.i18n.languages key controls the list of languages to generate for. If it has 1 item, then you get a single language, if it has $n$ items, then you get $n$ languages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this variable is that if you do MDBOOK_BOOK__LANGUAGE=fr mdbook build with translate_all_languages to false, it will just translate fr otherwise, it will translate all languages. We can skip that, it's just for convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this variable is that if you do MDBOOK_BOOK__LANGUAGE=fr mdbook build with translate_all_languages to false, it will just translate fr otherwise, it will translate all languages. We can skip that, it's just for convenience.

Can we unify this behavior? We stop setting MDBOOK_BOOK__LANGUAGE and instead drive the behavior of the backend using solely a new output.i18n.languages field. It should behave like this:

  • The book.toml file checked in has all languages for the book.

  • To build fewer languages, people can remove the unwanted languages from book.toml. Alternativley, they can use the environment variable support and run

    MDBOOK_OUTPUT__I18N__LANGUAGES='["fr"]' mdbook build

    to restrict the languages to just French.

Why have a list in book.toml instead of just using po/*.po? Because there could be extra meta data compared to what we find in the PO files. I'm thinking of things like english_name = "French", name = "Français" and maybe even available_in_language_picker = true.

Does that sounds like a good idea? So instead of a "all languages or a single language" based on MDBOOK_BOOK__LANGUAGE (which doesn't make a ton of sense for a multi-language book), we have a more flexible system that people can override on the command line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, it would require refactoring a bit.

In the current state, it works like this:

mdbook build runs normally on whatever language is set in the environment or manually in the book, then we execute a variable number of backends before reaching this one. From this backend, we loop over the languages defined in book.toml and do mdbook build for each language.

What you're suggesting would mean intercepting everything right at the beginning, so it would need to be a preprocessor that would need to run first and then orchestrate the language directives. It works too but would require some refactoring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're suggesting would mean intercepting everything right at the beginning,

I'm not actually suggesting that — I was only suggesting changing the logic for which subset of the languages to build.

#[serde(default)]
translate_all_languages: bool,
/// Whether to move the translations to the html directory, defaults to false.
///
/// By default, translations' output will live in `book/i18n/<language>/<renderer>`.
/// For all renderers in this list, we will move individual translations to `book/<renderer>/<language>`.
#[serde(default)]
move_translations_directories: Vec<String>,
Comment on lines +34 to +36
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're saying here that you'll break the contract defined by mdbook.

I don't want the new renderer to touch files in the output directories of other renderers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very convenient with mdbook serve to view the output like it will be on the final site. It doesn't need to be enabled otherwise and by default does nothing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're saying here that you'll break the contract defined by mdbook.

I don't want the new renderer to touch files in the output directories of other renderers.

It's very convenient with mdbook serve to view the output like it will be on the final site. It doesn't need to be enabled otherwise and by default does nothing

I don't think you addressed my concern: if the code is writing outside of the output directly, then the code is doing something wrong.

I don't doubt that it is convenient: we should just write everything inside of the designated output directory.

Maybe you're not aware, but I often use mdbook build -d /tmp/foo and similar to generate output in non-standard directories. So this renderer must respect the output setting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, we don't break the contract but if we enable this, we do. That's also what we're already doing in comprehensive rust, so I don't really see a problem as it is opt-in only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also what we're already doing in comprehensive rust, so I don't really see a problem as it is opt-in only.

I hope our mdbook plugins are not breaking this?

Are you talking about the code in publish.yml, then that is quite different: the code there relies on the contract of mdbook which says that it will put things into directory so-and-so. After calling mdbook build, we are of course free to move things around and mess with the book/ directory — we own it then!

This is the main difference between implementing this functionality as a wrapper script (as suggested in #18!) and writing it as a mdbook plugin. As a plugin, you are subjected to the rules of the surrounding system.

It might be a subtle difference, but I believe it's an important one. By following the principle carefully on every level of the stack, we will be able to build bigger "stacks" of software which behaves in a predictable manner when seen as a whole.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script does this:

for language in languages:
    cmd(f"MDBOOK_BOOK__LANGUAGE={language} mdbook build . -d i18n-helpers/{language}")
    for renderer_to_move in move_translations_directories:
        cmd(f"mv i18n-helpers/{language}/{renderer_to_move} {renderer_to_move}/{language}")

Note that the move happens after the mdbook build command ran. Which respects this statement:

After calling mdbook build, we are of course free to move things around and mess with the book/ directory — we own it then!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script does this

Sorry, I don't think I know what script you're referring to here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renderer I mean

}

fn main() {
let mut stdin = io::stdin();

// Get the configs
sakex marked this conversation as resolved.
Show resolved Hide resolved
let ctx = RenderContext::from_json(&mut stdin).unwrap();
let i18n_config: I18nConfiguration = ctx
.config
.get_deserialized_opt("output.i18n")
.unwrap()
.unwrap();

if !i18n_config.translate_all_languages {
return;
}

let mut mdbook = mdbook::MDBook::load(&ctx.root).expect("Failed to load book");
// Overwrite with current values from stdin. This is necessary because mdbook will add data to the config.
mdbook.book = ctx.book.clone();
mdbook.config = ctx.config.clone();
mdbook.root = ctx.root.clone();
sakex marked this conversation as resolved.
Show resolved Hide resolved

let book_config = mdbook
.config
.get_mut("output.i18n")
.expect("No output.i18n config in book.toml");
// Set translate_all_languages to false for nested builds to prevent infinite recursion.
book_config
.as_table_mut()
.expect("output.i18n config in book.toml is not a table")
.insert(String::from("translate_all_languages"), false.into());

let output_directory = ctx.destination;
let default_language = &i18n_config.default_language;

for language in i18n_config.languages.keys() {
// Skip current language and default language.
if Some(language) == ctx.config.book.language.as_ref() {
continue;
}
if let Some(default_language) = default_language {
if default_language == language {
continue;
}
}
sakex marked this conversation as resolved.
Show resolved Hide resolved
let translation_path = output_directory.join(language);

// Book doesn't implement clone, so we just mutate in place.
mdbook.config.book.language = Some(language.clone());
mdbook.config.book.multilingual = true;
mdbook.config.build.build_dir = translation_path;
mdbook
.build()
.unwrap_or_else(|_| panic!("Failed to build translation for language: {}", language));
sakex marked this conversation as resolved.
Show resolved Hide resolved
for renderer in &i18n_config.move_translations_directories {
std::fs::create_dir_all(output_directory.parent().unwrap().join(renderer))
.unwrap_or_else(|_| panic!("Failed to create html directory in output directory"));
sakex marked this conversation as resolved.
Show resolved Hide resolved
std::fs::rename(
output_directory.join(language).join(renderer),
output_directory
.parent()
.unwrap()
.join(renderer)
.join(language),
)
.unwrap_or_else(|_| {
panic!("Failed to move translation for language {language} to output directory")
});
}
}
}