Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathML 4 extensions for alignment and possible deprecation of <maligngroup/> and <malignmark/> #181

Open
NSoiffer opened this issue Jan 2, 2020 · 21 comments
Labels
compatibility Issues affecting backward compatibility MathML 4 Issues affecting the MathML 4 specification need polyfill Issues requiring implementation changes need specification update Issues requiring specification changes

Comments

@NSoiffer
Copy link
Contributor

NSoiffer commented Jan 2, 2020

Having just implemented a polyfill for elementary math, that got me thinking about some related ideas:

  1. The most obvious concept related to long division is synthetic division. It is basically the same idea as long division except that you are dividing polynomials. With synthetic division, the columns contain numbers (the coefficients of the polynomial), not just digits. As a refresher, see this page and the example taken from it below):
Polynomial Division Synthetic Division
image image
  1. Synthetic division is a shorthand for long division of polynomials (left example above). Long division of polynomials is basically the same idea as long division of numbers except that instead of digits, you have monomials that need to go into their own column. Doing that automatically requires knowing the variable you want to "sort" on so that each monomial goes into the proper column.
  2. A very similar property is needed when displaying systems of equations -- each monomial wants to be in it's own column (in this case, the top level element would not be mlongdiv, but mstack).

There are a few complications such as decimal alignment of the coefficients:

      8.44x + 55  y =  0
      3.1 x -  0.7y = -1.1

Note that alignment requires knowing what characters/operators act as column separators (e.g., +and -, along with = and a few other relational operators). These would be inside of mo elements, so potentially any mo element could be a separator, or maybe an attribute specifies what the separators are (something to think about/discuss).

The above example is taken from the MathML 3 spec formaligngroup and malignmark. I think only MathPlayer ever implemented those elements and I suspect that you can count on your fingers the number of times they have been used. It is a very complicated feature to implement and to use. In contrast, I think the above features are an incremental extension to elementary math layout, so implementation (especially via an extension to the polyfill I wrote), means that supporting these features would be universal (assuming I or someone else extended the polyfill). Just as important, using this extension would be simple as it is a declarative notation that doesn't require modifying the generated layout other than at a high level (wrapping with mstack or mlongdiv). It would be less powerful though.

I suspect that this proposed extension to elementary math handles the large majority of cases where people play games with tables to achieve alignment, both in MathML and in TeX. @davidcarlisle: do you have any estimate of how many uses of table for alignment in TeX would be covered by this proposed extension? What are some of the cases that are missed by it?

@NSoiffer NSoiffer added compatibility Issues affecting backward compatibility MathML 4 Issues affecting the MathML 4 specification need polyfill Issues requiring implementation changes need resolution Issues needing resolution at MathML Refresh CG meeting need specification update Issues requiring specification changes labels Jan 2, 2020
@fred-wang fred-wang removed the need resolution Issues needing resolution at MathML Refresh CG meeting label Aug 12, 2020
@dginev
Copy link
Contributor

dginev commented Oct 9, 2020

Hello. I was looking for an appropriate issue to attach a recent piece of news I spotted, and since this issue discusses malignmark, it seems appropriate. There is a recent post about bypassing the sanitization of DOMPurify through an abuse of parsing MathML in HTML, details here:
https://portswigger.net/daily-swig/dompurify-mutation-xss-bypass-achieved-through-mathml-namespace-confusion

Summarized as:

In the MathML namespace, two special elements – mglyph and malignmark – allow the creation of a markup that is “in HTML namespace, but on reparsing it is in MathML namespace, [meaning that] the subsequent style tag [is] parsed differently and leading to XSS,” the researcher explained.

This might be relevant if you're searching for additional reasons for deprecation.

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Oct 15, 2020 via email

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Oct 15, 2020 via email

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Oct 17, 2020 via email

@davidcarlisle
Copy link
Collaborator

the schema has been updated to restrict use of malignmark, and to remove grouplaign attribute except in legacy schema

w3c/mathml-schema@4e897dc

@NSoiffer
Copy link
Contributor Author

The group tentatively agreed that these are no longer needed: they aren't implemented in browsers and are rarely generated. The main use is to get alignment right at the character level (e.g., decimal alignment). @davidcarlisle pointed out that there is a Unicode space character (U+2007) that is meant to be the width of a digit and that this can be used as padding to get decimal alignment to work.

I pointed out that with intent table properties, the accessibility problem of splitting equations across columns goes away.

This section (which is somewhat simplified from MathML 3) still is quite large, so removing this would be a good simplification to the spec and help align it with it.

@MurrayIII currently uses malignmark in his UnicodeMath implementation. He will investigate whether this is really needed. If not, then we can remove these from the spec.

@MurrayIII
Copy link

MurrayIII commented Nov 14, 2024

For the equation

image

Word copies the following MathML

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:maligngroup/><mml:mn>10</mml:mn><mml:malignmark/><mml:mi>x</mml:mi><mml:mi> </mml:mi><mml:mo>+</mml:mo><mml:maligngroup/><mml:mn>3</mml:mn><mml:malignmark/><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:maligngroup/><mml:mn>3</mml:mn><mml:malignmark/><mml:mi>x</mml:mi><mml:mi> </mml:mi><mml:mo>+</mml:mo><mml:maligngroup/><mml:mn>13</mml:mn><mml:malignmark/><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math>

@NSoiffer
Copy link
Contributor Author

Reformatted the above MathML to be more readable:

<math><mtable>
  <mtr><mtd><mrow>
        <maligngroup/><mn>10</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>3</mn><malignmark/><mi>y</mi><mo>=</mo><mn>2</mn>
   </mrow></mtd></mtr>
   <mtr><mtd><mrow>
        <maligngroup/><mn>3</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>13</mn><malignmark/><mi>y</mi><mo>=</mo><mn>4</mn>
   </mrow></mtd></mtr>
</mtable></math>

This makes it easier to see the maligngroup and malignmark elements.

@davidcarlisle
Copy link
Collaborator

If I change that example to 103 rather than 13, so the expressions have different lengths

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8"/>
    <title>malign</title>
  </head>
  <body>
   <math><mtable>
  <mtr><mtd><mrow>
        <maligngroup/><mn>10</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>3</mn><malignmark/><mi>y</mi><mo>=</mo><mn>2</mn>
   </mrow></mtd></mtr>
   <mtr><mtd><mrow>
        <maligngroup/><mn>3</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>103</mn><malignmark/><mi>y</mi><mo>=</mo><mn>4</mn>
   </mrow></mtd></mtr>
</mtable></math>
  </body>
</html>

that renders (in Edge here) as

image

So there is no alignment at all visually (similarly in firefox)

If I add

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_SVG"></script>

to get mathjax rendering it still doesn't align:

image

@dginev
Copy link
Contributor

dginev commented Nov 14, 2024

Here is a Nov 2024, Chrome workaround for that example using float: right; and extra mtd elements, for archival purposes:

https://codepen.io/dginev/pen/XWvGEEZ

image

@davidcarlisle
Copy link
Collaborator

@dginev ooh scary, I can confirm that works (and essentially works the same way in HTML table markup)

So the containing box for a table cell for the purpose of float positioning is the implicit column box?

I tried to navigate mdn or the css specs to find a clear statement on what's supposed to happen if you apply float:... to a table cell, but failed to find anything definitive, did you find something, or did you just find this works?

@dginev
Copy link
Contributor

dginev commented Nov 15, 2024

or did you just find this works?

Indeed, just finding things that work by analogy with HTML.
I believe I found float:right; in a demonstration of how to "right-align a <div> element with CSS".

@davidcarlisle
Copy link
Collaborator

yes right aligning a div in its container with float:right is clear enough but It had never occurred to me you could apply it to a table cell or where it would float to if you did. It's clearly legal (as I found documentation that using float implicitly changes the table-cell display property to block) but I couldn't find a clear description of what happens. Nice example in any case

@davidcarlisle
Copy link
Collaborator

with firefox it seems I just need text-align, with chrome based browsers float:right is needed, but float-left in left aligned columns doesn't work at all. So this document renders Murray's example in both.

<!DOCTYPE html>
<html>
  <head>
    <script>
      window.addEventListener("load",function () {
	  document.querySelectorAll("mtable").forEach(m =>
	      m.innerHTML=m.innerHTML
		  .replace(/<maligngroup><\/maligngroup>/g,
			   "</mrow></mtd><mtd style='padding:0pt;text-align:right;float:right'><mrow>")
		  .replace(/<malignmark><\/malignmark>/g,
			   "</mrow></mtd><mtd style='padding:0pt;text-align:left;'><mrow>")
	  )},false);
     </script>
    <meta charset="UTF-8"/>
    <title>malign</title>
  </head>
  <body>
   <math><mtable>
  <mtr><mtd><mrow>
        <maligngroup/><mn>1000</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>3</mn><malignmark/><mi>y</mi><mo>=</mo><mn>2</mn>
   </mrow></mtd></mtr>
   <mtr><mtd><mrow>
        <maligngroup/><mn>3</mn><malignmark/><mi>x</mi><mi/><mo>+</mo>
        <maligngroup/><mn>103</mn><malignmark/><mi>y</mi><mo>=</mo><mn>4444</mn>
   </mrow></mtd></mtr>
</mtable></math>
  </body>
</html>

firefox

image

edge

image

@NSoiffer
Copy link
Contributor Author

Can someone with Safari verify @davidcarlisle's solution works on Safari?

If so, it seems like there is a reasonable solution for dealing with Word's output and we can simplify the MathML full spec and get it in closer agreement to core. Given this solution, it seems highly unlikely even this scaled down version of maligngroup/malignmark would ever get added to core.

@FrankMittelbach
Copy link

Here is the output from firefox (left) and safari on the right:

Screenshot 2024-11-16 at 23 43 57

@davidcarlisle
Copy link
Collaborator

Here is the output from firefox (left) and safari on the right:
Screenshot 2024-11-16 at 23 43 57

thanks

@davidcarlisle
Copy link
Collaborator

@NSoiffer so looks like it basically works in chrome based browsers., firefox and Safari

@MurrayIII
Copy link

MurrayIII commented Dec 18, 2024

Small addition: since mtd is mrow-like, the polyfill above also needs to handle an mtd without an mrow enclosing the mtd contents.

@MurrayIII
Copy link

The float:right displays a column with a single character correctly, but it stacks multiple characters above one another as in

image

Without the float:right, this displays as

image

Perhaps the polyfill should only emit float:right for single-character maligngroups.

@davidcarlisle
Copy link
Collaborator

I took an action item on the WG call on 2024-12-19 to suggest a specification update to resolve this issue. The suggestion below is not what I had in mind yesterday but having looked at this issue again and the current text I think it is perhaps the most viable option.

@NSoiffer @MurrayIII I think even if we cut down malignmark and maligngroup to what is needed to cover MS Word output, plus whatever makes sense to be allowed in a schema that targets those cases, the alignment spec in chapter 3 will end up being quite complicated.

As the elements are not in Core, doing this won't directly help compatibility of Office-produced documents. As now, and as shown in the comments above, they will require additional javascript and CSS to make the alignments work.

If however we remove them from MathML4 completely, the Office generated HTML+MathML would be flagged as invalid by validators which is not a desirable outcome.

Given that the desired end result is that Office generated output works in current browsers (via a Math WG supplied polyfill) and that that output is considered valid, I think a simpler approach to the specification would be:

  • Declare maligngroup and malignmark as legacy compatibility elements that are valid in (any) mtd and mrow
  • Specify in full that they are valid but have no defined behaviour.
  • Provide a polyfil that handles as much of the MathML3 alignment as reasonable, covering at minimum all output from the Office suite.

This would simplify (remove) almost all the current text while keeping office output valid and improving the actual rendering of these alignments in current browsers by providing code so that the alignment works (whereas it typically does not work currently)

It allows flexibility in the polyfill to adapt to cover any exiting cases "in the wild" (not just office output) without having to formally specify the behaviour to an extent that would be required if the alignment is specified in the full spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Issues affecting backward compatibility MathML 4 Issues affecting the MathML 4 specification need polyfill Issues requiring implementation changes need specification update Issues requiring specification changes
Projects
None yet
Development

No branches or pull requests

6 participants