Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve MathJax Support #626

Closed
wants to merge 23 commits into from

Conversation

dvberkel
Copy link
Contributor

This pull request is work in progress towards solving issue #400.

Feedback is appreciated.

@dvberkel
Copy link
Contributor Author

dvberkel commented Feb 20, 2018

@Michael-F-Bryan The bulk of the work for this pull-request is done. There are a few points left to do, but maybe you want to take a look to get a feel for the overall quality of the work.

I have taken the links pre-processor as inspiration and modified it where I saw fit.

This still can't be merged for the following things need to be addressed for sure.

  • Improve the regular expression.
  • Expand the test cases.
  • Run rustfmt.
  • Incorporate into the system.
  • Expand in tests/rendered_output.rs.

Furthermore, maybe squash the commits down to a single commit?

Copy link
Contributor

@Michael-F-Bryan Michael-F-Bryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple comments about points we may want to spend more time on, but they mostly come under the dot-points you've already mentioned (e.g. test cases and checking the regex).

One thing I think we should do is add a bunch of mathjax and non-mathjax expressions to the "dummy" book then add tests to tests/rendered_output.rs to make sure the appropriate <div class="math">$...$</div> tag is rendered.


fn find_mathematics(content: &str) -> MathematicsIterator {
lazy_static! {
static ref REGEXP: Regex = Regex::new(r"(?x) # insignificant whitespace mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to split this regex up and spell out what it's doing! Regular expressions are usually unreadable, so when you mentioned you weren't too confident with the regex I was a little afraid of what I'd see 😜

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I has been said that regular expressions are a write only tool. I am pretty confident with them, but I know for a fact that it currently is incorrect.

match delimiter.as_str() {
"$$" => Kind::Block,
"$" => Kind::Inline,
"\\\\[" => Kind::LegacyBlock,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always use a raw string here, r"\\[" to avoid escaping backslashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good suggestion. I wasn't aware of raw strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

start_index: m.start(),
end_index: m.end(),
kind: kind,
text: kind.text(m.as_str()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the text field be the body of the mathjax expression, ignoring the surrounding $'s? I've never used mathjax, so I don't know what form it wants things in.

Copy link
Contributor Author

@dvberkel dvberkel Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mathjax to work properly it needs the surrounding dollar signs. I strip them here so we can put them in when we do text replacement. This way we can treat the intended and the legacy case the same. ✔️


#[test]
fn should_find_legacy_inline_mathematics() {
let content = "Pythagorean theorem: \\\\(a^{2} + b^{2} = c^{2}\\\\)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to use raw strings here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

let mut replacement = String::new();
match self.kind {
Kind::Block | Kind::LegacyBlock => {
replacement.push_str("<div class=\"math\">$$");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These match arms can probably be turned into format!() calls:

format!(r#"<div class="math">$${}$$</div>"#, self.text)

... Or do you think the incremental string building is more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think I prefer the format! calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

}

impl Kind {
fn text<'a>(&self, delimited_text: &'a str) -> &'a str {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought this method belongs on the Mathematics type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed it on Kind because I needed the body of the delimited text when I construct a Mathematics structure. This could well be done in a constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for a auxiliary function with the same functionality. ✔️

}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've mentioned, we probably want to flesh these test out a bit. Off the top of my head we probably want to check

  • Strings with just one $
  • What happens when you have a newline between delimiters (e.g. "$foo\nbar$)
  • Is there a limit to the length of an expression? (e.g. you have a $ at the start of your chapter and another towards the end of the page)
  • What happens if I overlap delimiters? e.g. $a^{2} + b^{2} = c^{2}\\] and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of these kinds of tests. I will add them in.

@dvberkel
Copy link
Contributor Author

I will work on your suggestion and the ones I already had in mind. I have added the tests/rendered_output.rs as a separate bullet point.

One further question. When the work for this issue is done, would you prefer a squashed commit? Or do you want to keep the history?

@dvberkel
Copy link
Contributor Author

dvberkel commented Mar 5, 2018

I am starting to incorporate the mathjax pre-processor into the system and we need to discuss how, because there are several options.

In src/book/mod.rs the determine_preprocessors function returns a list of pre-processors. It does this by a lookup on the name of the pre-processor configured in the config.rs BuildConfig.

Mathjax is only relevant for the HTML renderer and a mathjax_support option can be found in the HtmlConfig.

In order to enable the mathjax pre-processor I see the following options. If a user wants mathjax support

  1. She should add the mathjax key in the pre-processors in the build configuration and set mathjax_support to true in the HTML configuration.
  2. She should set the mathjax_support to true in the HTML configuration. This automatically add the mathjax pre-processor to the list of pre-processors.

Maybe other options are possible as well.

@Michael-F-Bryan How would you like me to proceed?

@Michael-F-Bryan
Copy link
Contributor

When the work for this issue is done, would you prefer a squashed commit? Or do you want to keep the history?

I usually merge PRs with the "squash and merge" button because it leads to a much cleaner git history. GitHub will do the squashing for us, so you shouldn't need to do anything.

Mathjax is only relevant for the HTML renderer and a mathjax_support option can be found in the HtmlConfig.

This is one of the things I was referring to in #629 (comment) when I said:

... I'm stumped on a couple things, for instance some preprocessors will only be relevant when combined with a specific backend ...

I was thinking a user would add the mathjax preprocessor in their book.toml (or we'd add it as part of the default list if the user doesn't specify any preprocessors) and then for each backend we'll run the entire preprocessor pipeline independently. You can then tell each preprocessor which renderer it's being run for (the Renderer trait requires a name() method), and it then becomes the preprocessor's responsibility to change its behaviour (e.g. run or return early) based on that.

Or expressed as code:

// when generating the book

for renderer in &self.renderers {
  let mut book = self.book.clone();
  let name = renderer.name();
  
  for preprocessor in &self.preprocessors {
    let ctx = /* ... */;
    preprocessor.run(ctx, name, &mut book)?;
  }

  renderer.render(...)?;
}

// in the MathJax preprocessor

const RENDERER_WHITELIST: &[&str] = &["html"];

impl Preprocessor for MathJax {
  fn run(&self, ctx: &PreprocessorContext, renderer: &'static str, book: &mut Book) -> Result<()> {
    if !RENDERER_WHITELIST.contains(renderer) {
      return Ok(());  // there's nothing to do. Bail early.
    }

    /* do usual preprocessing */
  }
}

This is probably going to be the most scalable approach because we aren't using an ad-hoc side-channel for communicating whether the preprocessor should run. It does allow you to couple preprocessors to specific renderers, but in this case I'd say the MathJax preprocessor is already innately coupled to the HTML renderer so it's not necessarily a problem. If a preprocessor doesn't really care which renderer it's being run for (e.g. the current link rewriters), it can always ignore the renderer argument.

@dvberkel sorry for the wall of text! It's a non-trivial problem and I can imagine as we start using preprocessors for more stuff we'll need to find a more generic solution. I'm kinda hesitant to add custom logic to the determine_preprocessor() function (your second point) because that couples it to the HTML renderer and won't work for 3rd party preprocessors.

@dvberkel
Copy link
Contributor Author

dvberkel commented Mar 5, 2018

@Michael-F-Bryan Don't worry about the wall of text. It conveys your point very well. I would rather want to read more, think and do a well-informed job then a rush to a mediocre solution.

Is what you describe already in place? Or is it something that needs to be done?

@Michael-F-Bryan
Copy link
Contributor

Is what you describe already in place? Or is it something that needs to be done?

It's one of the ideas I've been thinking about, but nothing is implemented yet. I started #631 so we have a place to discuss these sorts of things.

Of itself, introducing this renderer argument to Preprocessor::run() is a fairly trivial change to make. I just want to hear more people's opinions before implementing anything.

For example what do we do if a user wants to say "run the MathJax preprocessor when building an EPUB"? With this system the preprocessor's whitelist would be hard-coded and wouldn't allow that. A preprocessor could add a setting under its table in book.toml (e.g. renderers = ["html", "epub"]), but then we're just swapping one ad-hoc side-channel for another.

Another option is to let mdbook decide whether to run a preprocessor using the previous renderers = ... setting, falling back to some sort of hint from the preprocessor if it's not available.

e.g. the user could manually specify which renderer to run the MathJax preprocessor for:

[preprocessor.mathjax]
renderers = ["html", "epub"]

And if the renderers key isn't provided, we use a fallback, probably by adding a fn hint_renderer_is_supported(&self, render: &str) -> bool method to the Preprocessor trait.

@dvberkel
Copy link
Contributor Author

@Michael-F-Bryan I was away for a while. My mom died and I needed to take care of things.

What is the best way forward? should we first decide on a #631? It seems like the best option.

@Michael-F-Bryan
Copy link
Contributor

Hi @dvberkel I'm really sorry to hear about that.

I'm thinking we should just make a decision on how renderers and preprocessors should interact and that'll unblock this issue.

@Dylan-DPC-zz
Copy link

@dvberkel since this PR is still WIP and it was created a while back, would be a good idea to close it. Feel free to submit a new PR and thanks for contributing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants