Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue 980 #1

Merged
merged 1 commit into from
Feb 23, 2017
Merged

Fixed issue 980 #1

merged 1 commit into from
Feb 23, 2017

Conversation

joined
Copy link

@joined joined commented Feb 23, 2017

I wrote a bugfix for issue jupyter#980. Description follows.

Problem

The problem is relative to the inline LaTeX rendering of the Markdown/text cells. The creator of the issue provided the following example to illustrate it:

$M_{RED\rightarrow RGB}$, $M_{XYZ\rightarrow RGB}$ # Fine

\\(M_{RED\rightarrow RGB}\\), $M_{XYZ\rightarrow RGB}$ # Fine

\\(M_{RED\rightarrow RGB}\\), \\(M_{XYZ\rightarrow RGB}\\) # Broken

This results in the following:

Background

One of the cell types supported by Jupyter is the text/Markdown cell. Using this type of cell we can write Markdown which gets processed when the cell is executed. Moreover, it's possible to write inline LaTeX which is rendered using MathJax.

Combining Markdown and LaTeX is a delicate operation, since some special characters are shared by the two languages, leading to conflicts.

The LaTeX code that needs to be rendered is specified in the text cell using delimiters. In particular, the following delimiters are supported:

  1. $
  2. $$
  3. \begin and \end
  4. \( and \)

A note on the last one. It's necessary to write \\( instead of \( because in the latter case the backslash gets interpreted by Markdown as an escaper for the parenthesis.

There is a mechanism in place in order for the LaTeX code not to get interpreted as Markdown. It works as follows:

  • The user enters the Markdown with inline LaTeX in the text cell, for example *abc* $x_1 = 1, x_2 = 2$ _def_, and clicks execute cell.
  • The function remove_math in notebook/static/notebook/js/mathjaxutils.js is used to extract the LaTeX groups from the text, put them in a separate array, and replace the groups in the text with placeholders. In this case, for example, the function will return ['*abc* @@0@@ _def_', ['$x_1 = 1, x_2 = 2$']].
  • The text gets rendered as Markdown.
  • The placeholders get replaced with the LaTeX groups that were backed up.
  • The rendered Markdown is processed by MathJax to render the LaTeX.

This procedure is necessary since, otherwise, the underscores in the LaTeX group would be interpreted as italic delimiters, in this example.

The core of the problem is that the function remove_math extracts from the text only the groups delimited by 1, 2 and 3, but not 4.

It's easy in fact to see that in the third line in the example the underscores were interpreted as italic delimiters by Markdown.

Fix

The changes are made on the notebook/static/notebook/js/mathjaxutils.js file.

The remove_math function contains some logic to identify the LaTeX blocks and extract them. The first step in this procedure is to split the text on all the possible group delimiters. This is done using the MATHSPLIT regular expression defined on line 62. As the comments in the code say, it's a bit "magical" in the sense that its workings are not crystal clear.

The \\( and \\) delimiters were missing from the regular expression, so I added them appending \\\\(?:\(|\))) to it. Moreover, since the regular expression was already matching the text \\ as a delimiter, I had to remove it (otherwise the block \\( would not be grouped together and we would not be able to identify it as a group delimiter). It is not clear to me the purpose of splitting on the \\s since we're only looking for LaTeX group delimiters and they are not. I suspect it a result of a blind copy & paste from somewhere else, since the comments cite different sources for the code.

After being split, the text is processed by running over each of the blocks and looking for start and end LaTeX delimiters. On line 181 I added the missing logic to handle the case in which the text \\(is the start delimiter and the text \\) is the end delimiter.

The last change is in the line 208. It is necessary because since the LaTeX code is extracted, backed up, and reinserted in the text after the Markdown is rendered, the \\( that was necessary for Markdown is not interpreted by it (resulting in \(), so we have to manually replace the instances of \\( and \\) to \( and \) respectively, which are the delimiters that are recognized by MathJax.

Thoughts?

@John-Pap
Copy link

You are absolutely right Lorenzo! Thanks to your detailed description I was able to also spot the issue. Excellent work!

@John-Pap John-Pap merged commit 06f2a72 into master Feb 23, 2017
@adimitrova
Copy link

Fast and accurate solution! Well done! :octocat: :octocat: :octocat:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants