Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve MathJax Support #626

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2f89760
WIP: Improve MathJax Support
dvberkel Feb 17, 2018
643edbe
Create a skeletal NOOP implementation of MathJaxPreprocessor
dvberkel Feb 17, 2018
7d0a1f1
Iterate over section without replacing mathematics
dvberkel Feb 18, 2018
16e22e1
Create a skeletal find_mathematics function
dvberkel Feb 18, 2018
7657044
Describe different kind of mathematics
dvberkel Feb 19, 2018
b4f77eb
Implement find_mathemtics loop inside replace_all
dvberkel Feb 19, 2018
784b814
Implement find_mathematics for inline maths
dvberkel Feb 19, 2018
4aa0df1
Include mathematics text
dvberkel Feb 19, 2018
534dfb4
Recognize both inline as block mathematics
dvberkel Feb 19, 2018
2a05964
Recognize both types of legacy mathematics
dvberkel Feb 20, 2018
92259f1
Capture mathematics text in regular expression
dvberkel Feb 20, 2018
e5be09d
Reveal kind of mathematics earlier
dvberkel Feb 20, 2018
121f638
Remove delimiters from mathematics text
dvberkel Feb 20, 2018
ea4f998
Do actual replacement of text
dvberkel Feb 20, 2018
a55e640
Use raw strings to prevent excessive exscaping
dvberkel Feb 22, 2018
1b0abf7
Use the format! macro instead of push_str
dvberkel Feb 22, 2018
e68bee5
Demote text method on kind to auxiliary function
dvberkel Feb 22, 2018
1648126
Don't capture the mathematics text
dvberkel Feb 24, 2018
2512b87
Correct regexp for mathematics
dvberkel Feb 26, 2018
1871b87
Test text with a single dollar sign
dvberkel Mar 3, 2018
92b5fad
Detect mathematics over multiple lines
dvberkel Mar 3, 2018
629c778
Test non-matching delimiters
dvberkel Mar 3, 2018
7540a3e
Run `cargo fmt` on preprocess/mathjax.rs
dvberkel Mar 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
303 changes: 303 additions & 0 deletions src/preprocess/mathjax.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
//! Preprocessor that converts mathematical expression into MathJax.
//!
//! This preprocessor takes inline expressions wrapped in `$`-pairs and block
//! expressions wrapped in `$$`-pairs and transform them into a valid MathJax
//! expression that does not interfere with the markdown parser.

use errors::Result;
use regex::{CaptureMatches, Captures, Regex};

use super::{Preprocessor, PreprocessorContext};
use book::{Book, BookItem};

/// a preprocessor for expanding `$`- and `$$`-pairs into valid MathJax expressions.
pub struct MathJaxPreprocessor;

impl MathJaxPreprocessor {
/// Create a `MathJaxPreprocessor`.
pub fn new() -> Self {
MathJaxPreprocessor
}
}

impl Preprocessor for MathJaxPreprocessor {
fn name(&self) -> &str {
"mathjax"
}

fn run(&self, _ctx: &PreprocessorContext, book: &mut Book) -> Result<()> {
book.for_each_mut(|section: &mut BookItem| {
if let BookItem::Chapter(ref mut chapter) = *section {
let content = replace_all_mathematics(&chapter.content);
chapter.content = content;
}
});

Ok(())
}
}

fn replace_all_mathematics(content: &str) -> String {
let mut previous_end_index = 0;
let mut replaced = String::new();

for math in find_mathematics(content) {
replaced.push_str(&content[previous_end_index..math.start_index]);
replaced.push_str(&math.replacement());
previous_end_index = math.end_index;
}

replaced.push_str(&content[previous_end_index..]);

replaced
}

fn find_mathematics(content: &str) -> MathematicsIterator {
lazy_static! {
static ref REGEXP: Regex = Regex::new(r"(?x) # insignificant whitespace mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to split this regex up and spell out what it's doing! Regular expressions are usually unreadable, so when you mentioned you weren't too confident with the regex I was a little afraid of what I'd see 😜

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I has been said that regular expressions are a write only tool. I am pretty confident with them, but I know for a fact that it currently is incorrect.

# Mathematics is

# Block mathematics is
(\$\$) # a double dollar sign
(?: # followed by one or more
[^$] # things other than a dollar sign
| # or
\\\$ # an escaped dollar sign
)+
(\$\$) # followed by a closing double dollar sign.

| # or

# Inline mathematics is
(\$) # a dollar sign
(?: # followed by one or more
[^$] # things other than a dollar sign
| # or
\\\$ # an escaped dollar sign
)+
(\$) # followed by a closing dollar sign.

| # or

# Legacy inline mathematics
(\\\\\() # An escaped opening bracket `\\(`
.+? # followed by one or more other things, but lazily
(\\\\\)) # followed by a closing bracket `\\)`

| # or

# Legacy block mathematics
(\\\\\[) # An escaped opening bracket `\\[`
.+? # followed by one ore more other things, but lazily
(\\\\\]) # followed by a closing bracket `\\]`
").unwrap();
}
MathematicsIterator(REGEXP.captures_iter(content))
}

struct MathematicsIterator<'a>(CaptureMatches<'a, 'a>);

impl<'a> Iterator for MathematicsIterator<'a> {
type Item = Mathematics<'a>;

fn next(&mut self) -> Option<Self::Item> {
for capture in &mut self.0 {
if let mathematics @ Some(_) = Mathematics::from_capture(capture) {
return mathematics;
}
}
None
}
}

#[derive(Debug, PartialEq, Eq)]
struct Mathematics<'a> {
start_index: usize,
end_index: usize,
kind: Kind,
text: &'a str,
}

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
enum Kind {
Inline,
Block,
LegacyInline,
LegacyBlock,
}

impl<'a> Mathematics<'a> {
fn from_capture(captures: Captures<'a>) -> Option<Self> {
let kind = captures
.get(1)
.or(captures.get(3))
.or(captures.get(5))
.or(captures.get(7))
.map(|delimiter| match delimiter.as_str() {
"$$" => Kind::Block,
"$" => Kind::Inline,
r"\\[" => Kind::LegacyBlock,
_ => Kind::LegacyInline,
})
.expect("captured mathematics should have opening delimiter at the provided indices");

captures.get(0).map(|m| Mathematics {
start_index: m.start(),
end_index: m.end(),
kind: kind,
text: strip_delimiters_from_delimited_text(&kind, m.as_str()),
})
}

fn replacement(&self) -> String {
let replacement: String = match self.kind {
Kind::Block | Kind::LegacyBlock => {
format!("<div class=\"math\">$${}$$</div>", self.text)
}
Kind::Inline | Kind::LegacyInline => {
format!("<span class=\"inline math\">${}$</span>", self.text)
}
};
replacement
}
}

fn strip_delimiters_from_delimited_text<'a>(kind: &Kind, delimited_text: &'a str) -> &'a str {
let end = delimited_text.len();
match *kind {
Kind::Block => &delimited_text[2..end - 2],
Kind::Inline => &delimited_text[1..end - 1],
Kind::LegacyBlock => &delimited_text[3..end - 3],
Kind::LegacyInline => &delimited_text[3..end - 3],
}
}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've mentioned, we probably want to flesh these test out a bit. Off the top of my head we probably want to check

  • Strings with just one $
  • What happens when you have a newline between delimiters (e.g. "$foo\nbar$)
  • Is there a limit to the length of an expression? (e.g. you have a $ at the start of your chapter and another towards the end of the page)
  • What happens if I overlap delimiters? e.g. $a^{2} + b^{2} = c^{2}\\] and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of these kinds of tests. I will add them in.

use super::*;

#[test]
fn should_find_no_mathematics_in_regular_text() {
let content = "Text without mathematics";

assert_eq!(find_mathematics(content).count(), 0);
}

#[test]
fn should_find_no_mathematics_in_regular_text_with_a_single_dollar_sign() {
let content = "Text with a single $ mathematics";

assert_eq!(find_mathematics(content).count(), 0);
}

#[test]
fn should_find_no_mathematics_when_delimiters_do_not_match() {
let content = "$$Text with a non matching delimiters mathematics\\]";

assert_eq!(find_mathematics(content).count(), 0);
}

#[test]
fn should_find_mathematics_spanning_over_multiple_lines() {
let content = "Mathematics $a +\n b$ over multiple lines";

assert_eq!(find_mathematics(content).count(), 1);
}

#[test]
fn should_find_inline_mathematics() {
let content = "Pythagorean theorem: $a^{2} + b^{2} = c^{2}$";

let result = find_mathematics(content).collect::<Vec<_>>();

assert_eq!(result.len(), 1);
assert_eq!(
result[0],
Mathematics {
start_index: 21,
end_index: 44,
kind: Kind::Inline,
text: "a^{2} + b^{2} = c^{2}",
}
)
}

#[test]
fn should_find_block_mathematics() {
let content = "Euler's identity: $$e^{i\\pi} + 1 = 0$$";

let result = find_mathematics(content).collect::<Vec<_>>();

assert_eq!(result.len(), 1);
assert_eq!(
result[0],
Mathematics {
start_index: 18,
end_index: 38,
kind: Kind::Block,
text: "e^{i\\pi} + 1 = 0",
}
)
}

#[test]
fn should_find_legacy_inline_mathematics() {
let content = r"Pythagorean theorem: \\(a^{2} + b^{2} = c^{2}\\)";

let result = find_mathematics(content).collect::<Vec<_>>();

assert_eq!(result.len(), 1);
assert_eq!(
result[0],
Mathematics {
start_index: 21,
end_index: 48,
kind: Kind::LegacyInline,
text: "a^{2} + b^{2} = c^{2}",
}
)
}

#[test]
fn should_find_legacy_block_mathematics() {
let content = r"Euler's identity: \\[e^{i\pi} + 1 = 0\\]";

let result = find_mathematics(content).collect::<Vec<_>>();

assert_eq!(result.len(), 1);
assert_eq!(
result[0],
Mathematics {
start_index: 18,
end_index: 40,
kind: Kind::LegacyBlock,
text: "e^{i\\pi} + 1 = 0",
}
)
}

#[test]
fn should_replace_inline_mathematics() {
let content = "Pythagorean theorem: $a^{2} + b^{2} = c^{2}$";

let result = replace_all_mathematics(content);

assert_eq!(
result,
"Pythagorean theorem: <span class=\"inline math\">$a^{2} + b^{2} = c^{2}$</span>"
)
}

#[test]
fn should_replace_block_mathematics() {
let content = "Euler's identity: $$e^{i\\pi} + 1 = 0$$";

let result = replace_all_mathematics(content);

assert_eq!(
result,
"Euler's identity: <div class=\"math\">$$e^{i\\pi} + 1 = 0$$</div>"
)
}

}
3 changes: 2 additions & 1 deletion src/preprocess/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
pub use self::links::LinkPreprocessor;

mod links;
mod mathjax;

use book::Book;
use config::Config;
Expand Down Expand Up @@ -35,4 +36,4 @@ pub trait Preprocessor {
/// Run this `Preprocessor`, allowing it to update the book before it is
/// given to a renderer.
fn run(&self, ctx: &PreprocessorContext, book: &mut Book) -> Result<()>;
}
}