-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed issue #980: incorrect LaTeX capturing and MathJax rendering #2220
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally against the test cases in #980, looks good!
@michaelpacer could you have a look at this?
|
Hi all, a few thoughts (took a couple days to get my bearings on this one as there are a few related issues with which I was getting it confused). My unraveling this confusion led to these different points.
2.1. This should probably have some accompanying tests to check for the previously aberrant behaviour (in #759 and #980), but that assumes that we're still expanding our test suite. In other cases (e.g., #2048) it was decided that tests weren't required. This seems like a particularly precarious case (as it's a "bugfix" rather than a "feature request") and so if there is any planned further expansion of our test suite, then this would merit inclusion.
I think this observation is unrelated to this PR beyond from clarifying a point about when certain syntaxes are valid or invalid. However, if we wanted to allow |
Simpler comment: this should also cover |
@@ -178,6 +178,11 @@ define([ | |||
end = block; | |||
braces = 0; | |||
} | |||
else if (block === "\\\\\(") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modification should also cover the case where block === "\\\\\["
and end = "\\\\\]"
.
@@ -200,7 +205,9 @@ define([ | |||
// | |||
var replace_math = function (text, math) { | |||
text = text.replace(/@@(\d+)@@/g, function (match, n) { | |||
return math[n]; | |||
return math[n] | |||
.replace("\\\\\(", "\\\(") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (following this pattern) this should also cover .replace("\\\\\[", "\\\[")
and .replace("\\\\\]", "\\\]")
.
However, I have a code smell around this pattern of sequentially applied string replacement…
Shouldn't it be sufficient to exclusively looking for these as beginning and ending elements of the string passed into the second function call (since there shouldn't be more than instance of the each to every call to the anonymous function with the match, n
as an signature). Are there cases where this wouldn't be the prefix and the suffix that are passed into the function that provides the the second part of the text
return value?
I should mention, that I would prefer – given that no this=that
referencing needs to occur inside these anonymous functions – that the second anonymous function were separately, defined with a proper name. Then the second part of the assignment to the text
variable would be directly call that function. This would be a lot easier to read.
@joined - thank you for taking the time to open a PR and write out a very complete description of the problem. Thank you @michaelpacer for the incredible amount of detail and care you're putting into standardization on the format. Happy you're handling that. I can't speak much to the changes here today, so I'll have to come back to this. I can't be a blocker though, I live it in yours and @blink1073's hands. |
@rgbkrk's comment that you quoted makes sense to me (I don't remember the earlier PRs on the same subject), and thus I don't think that we should support |
@takluyver, sounds fair to me. |
I incorporated your feedback, I hope I got it right. Thanks for clarifying the confusion with And since you are discussing about it, no, I never attempted to add the support to |
@takluyver are you ok merging this without tests like before? If so this looks good to me, though a comment explaining the last replace call & the new function would be useful for future debugging. Not a blocker though. |
This goes beyond just fixing #980, and so is in no way blocking… @joined can you explain why |
K nvm, I just made that change. It's small enough of a difference that, if the tests pass, I'll merge, as it would seem to leave everything else untouched and that LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised this in #2048, and I want to reiterate that I think it is a mistake to add new functionality and provide fixes (especially around tricky issues) without tests. This is not conducive to writing robust software.
I want to add that I am grateful for the work everyone has put in here, thank you @joined for providing this fix, and for everyone else (especially @michaelpacer) for the reviews. So I'm not trying to pick on you, I just don't know how else to push back against what I think is a bad policy of merging features or fixes which do not have tests. |
@ivanov I understand your concerns — would you be able to put some cycles into fixing the js testing suite? That could include us pair coding our way through it since you're in the area. I'm not currently equipped to tackle the general notebook js testing problem in an efficient manner (and don't have the free cycles to teach myself about js testing from scratch). However, I think that I could pick up a lot once I got some intellectual momentum by working with someone who knows the ins and outs of the problem space. |
As far as I know, the JS test suite mostly works. It breaks because either some tests or some of the utility functions that tests uses end up having race conditions. I've created the Flaky JavaScript tests omnibus ticket #2243 where we can note down tests which occasionally fail (with any relevant details) so we can address this when it comes up. You can read about writing and running JS tests for the notebook here. The starting point for this specific PR would probably be to look at and add to notebook/tests/notebook/markdown.js I'm happy to meet with you @michaelpacer, but I hope you can start at the two links above so you can figure out what's missing and improve the docs and have a place to direct others for how to write and run these tests. |
So I looked into the testing for the rendered LaTeX code, it seems quite messy. For example rendering a cell with the content <p><span class="MathJax_Preview" style="color: inherit;"></span><span class="MathJax" id="MathJax-Element-7-Frame" tabindex="0" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>&#x2217;</mo><mi>b</mi></math>" role="presentation" style="position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-40" role="math" style="width: 2.443em; display: inline-block;"><span style="display: inline-block; position: relative; width: 2.027em; height: 0px; font-size: 120%;"><span style="position: absolute; clip: rect(1.67em 1002.03em 2.682em -999.997em); top: -2.497em; left: 0.003em;"><span class="mrow" id="MathJax-Span-41"><span class="mi" id="MathJax-Span-42" style="font-family: STIXMathJax_Main-italic;">a</span><span class="mo" id="MathJax-Span-43" style="font-family: STIXMathJax_Main; padding-left: 0.241em;">∗</span><span class="mi" id="MathJax-Span-44" style="font-family: STIXMathJax_Main-italic; padding-left: 0.241em;">b</span></span><span style="display: inline-block; width: 0px; height: 2.503em;"></span></span>
</span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 1.004em;"></span></span>
</nobr><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>∗</mo><mi>b</mi></math></span></span>
<script type="math/tex" id="MathJax-Element-7"> a*b </script>, <span class="MathJax_Preview" style="color: inherit;"></span><span class="MathJax" id="MathJax-Element-8-Frame" tabindex="0" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo>&#x2217;</mo><mi>d</mi></math>" role="presentation" style="position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-45" role="math" style="width: 2.384em; display: inline-block;"><span style="display: inline-block; position: relative; width: 1.967em; height: 0px; font-size: 120%;"><span style="position: absolute; clip: rect(1.67em 1001.97em 2.682em -999.997em); top: -2.497em; left: 0.003em;"><span class="mrow" id="MathJax-Span-46"><span class="mi" id="MathJax-Span-47" style="font-family: STIXMathJax_Main-italic;">c</span><span class="mo" id="MathJax-Span-48" style="font-family: STIXMathJax_Main; padding-left: 0.241em;">∗</span><span class="mi" id="MathJax-Span-49" style="font-family: STIXMathJax_Main-italic; padding-left: 0.241em;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span>
</span><span style="display: inline-block; width: 0px; height: 2.503em;"></span></span>
</span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 1.004em;"></span></span>
</nobr><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo>∗</mo><mi>d</mi></math></span></span>
<script type="math/tex" id="MathJax-Element-8"> c*d </script>
</p> I see many things that could go wrong when making an assertion on the rendered HTML output. For example MathJax attributes incremental ids to each rendered LaTeX group, and the numbering is global to the notebook. It seems also that the inline CSS that MathJax output could change basing on the enviroment making the test fail. Any ideas? |
I agree that trying to assert the whole MathJax generated DOM subtree would be difficult. What about drilling down into parts of it? Or asserting how many mo and mi elements there are in a given test case? (I also just updated the title of the PR to be more descriptive) |
So I wrote the following code in // Test LaTeX rendering with delimiter $
output = this.evaluate(function () {
IPython.notebook.to_markdown();
var cell = IPython.notebook.get_selected_cell();
cell.set_text('$ a*b $, $ c*d $');
cell.render();
var mi = cell.element[0].getElementsByTagName('mi'),
mo = cell.element[0].getElementsByTagName('mo');
return {
'n_mi': mi.length > 0 ? mi.length : 0,
'n_mo': mo.length > 0 ? mo.length : 0
};
});
this.test.assertEquals(output, {'n_mi': 4, 'n_mo': 2}, 'LaTeX rendering with delimiter $ works'); Unfortunately the test fails with the following output:
By checking the HTML contents of the rendered cell as seen by CasperJS with <div class="text_cell_render rendered_html" tabindex="-1"><p>$ a*b $, $ c*d $</p>\n</div> So it looks like MathJax doesn't render the cell, even if calling Any ideas? UpdateWell, I now realise that the problem is that MathJax rendering is asynchronous. I guess it's necessary to trigger an event when the MathJax rendering is completed (there doesn't seem to be already any event triggered) and then use CasperJS's It seems that the way to go to trigger an event after the cell has rendered is to push the trigger function to the MathJax Queue just after having queued the rendering. Maybe here instead of passing only the element I can pass the entire var typeset = function(cell) {
var $el = cell.element.jquery ? cell.element : $(cell.element);
if(!window.MathJax){
return;
}
$el.each(function(){
// MathJax takes a DOM node: $.each makes `this` the context
MathJax.Hub.Queue(["Typeset", MathJax.Hub, this]);
MathJax.Hub.Queue(cell.events.trigger("rendered.MarkdownCell", {'cell': cell}));
});
}; (I don't see the reason for returning the Thoughts? To be honest I feel like this is getting a bit bigger than expected and I think this kind of changes are better made by someone that has worked before with the codebase. |
Thank you a bunch for looking more into this, @joined, this will help us make progress. I met with @michaelpacer to get him up to speed on the javascript testing stuff earlier this week. We'll get there. |
Fix on jupyterlab with tests: jupyterlab/jupyterlab#1846 |
@michaelpacer Let me know what I can do to help get this merged! |
Test added and passing for a case confirmed to fail on master. Thanks, @michaelpacer! |
Also, this closes #759. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the tests @michaelpacer and @minrk 😄 ! Beautiful.
Are we ready to merge!? |
Thank you @joined, @michaelpacer, and all the reviewers |
I just happened to notice that now This goes very much against the spirit of Markdown and is very inconsistent from a Markdown standpoint. A backslash is normally used to avoid something being interpreted as formatting command, here it is used for the exact opposite! It also increases the already bad fragmentation of Markdown extensions. What about just not supporting this style of separators? |
@mgeier many people prefer to use this style of equation formatting command (even if I do not) Technically, the LaTeX community discourages the use of $$ as it is a TeX command that has different spacing rules than the LaTeX command (in which case the one backslash version Probably most importantly though, these delimiters have been supported in the notebook for a while now, they are not new. Now, they are being parsed correctly when rendered (making previous issues with subscript rendering vs. italics rendering disappear). |
Of note: nteract only supports the |
Thanks for the information, I wasn't aware of that. I still think it was an unfortunate decision to support them. I understand that it may be hard to get rid of them now, but I would still discourage their use in the future.
That's fair enough, I can totally live without those. I wouldn't even mention them in the documentation (and I don't mention them when I write documentation).
Exactly. |
I wrote a bugfix for issue #980.
Background
The LaTeX code that needs to be rendered inline in the text cells is specified using delimiters. In particular, the following delimiters are supported:
$
$$
\begin
and\end
\(
and\)
(A note on the last one. It's necessary to write
\\(
instead of\(
because in the latter case the backslash gets interpreted by Markdown as an escaper for the parenthesis.)There is a mechanism in place in order for the LaTeX code not to get interpreted as Markdown. It works as follows:
*abc* $x_1 = 1, x_2 = 2$ _def_
, and clicks execute cell.remove_math
innotebook/static/notebook/js/mathjaxutils.js
is used to extract the LaTeX groups from the text, put them in a separate array, and replace the groups in the text with placeholders. In this case, for example, the function will return['*abc* @@0@@ _def_', ['$x_1 = 1, x_2 = 2$']]
.This procedure is necessary since, otherwise, the underscores in the LaTeX group would be interpreted as italic delimiters, in this example.
The core of the problem is that the function
remove_math
extracts from the text only the groups delimited by 1, 2 and 3, but not 4.It's easy in fact to see that in the third line in the example in the issue the underscores are interpreted as italic delimiters by Markdown.
Fix
The changes are made on the
notebook/static/notebook/js/mathjaxutils.js
file.The
remove_math
function contains some logic to identify the LaTeX blocks and extract them. The first step in this procedure is to split the text on all the possible group delimiters. This is done using theMATHSPLIT
regular expression defined on line 62. As the comments in the code say, it's a bit "magical" in the sense that its workings are not crystal clear.The
\\(
and\\)
delimiters were missing from the regular expression, so I added them appending\\\\(?:\(|\))
to it. Moreover, since the regular expression was already matching the text\\
as a delimiter, I had to remove it (otherwise the block\\(
would not be grouped together and we would not be able to identify it as a group delimiter). It is not clear to me the purpose of splitting on the\\
s since we're only looking for LaTeX group delimiters and they are not. I suspect it a result of a copy & paste from somewhere else, since the comments cite different sources for the code.After being split, the text is processed by running over each of the blocks and looking for start and end LaTeX delimiters. On line 181 I added the missing logic to handle the case in which the text
\\(
is the start delimiter and the text\\)
is the end delimiter.The last change is in the line 208. It is necessary because since the LaTeX code is extracted, backed up, and reinserted in the text after the Markdown is rendered, the
\\(
that was necessary for Markdown is not interpreted by it (resulting in\(
), so we have to manually replace the instances of\\(
and\\)
to\(
and\)
respectively, which are the delimiters that are recognized by MathJax.