Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathML copied using MathJax does not paste into Word successfully #21

Open
ways2read opened this issue Sep 9, 2024 · 2 comments
Open
Labels

Comments

@ways2read
Copy link
Member

ways2read commented Sep 9, 2024

Related to #4

Steps to replicate:

  1. Visit the page The Lorentz Transformations
  2. Explore the page to find some math. For example, the first expression after the heading "The Light Cone".
  3. Invoke the MathJax menu, then Show math as, then MathML code. You can also choose Original MathML code.
  4. A new window opens displaying the MathML. Copy the content of this window to the clipboard.
  5. Switch to Word and paste the math. It is not converted into an Office Math expression.
  6. Switch to Notepad and paste the clipboard into a new document. Prepend http: to the value at the beginning of the xmls attribute on the first line. So <math xmlns="//www.w3.org/1998/Math/MathML becomes <math xmlns="http://www.w3.org/1998/Math/MathML"
  7. Copy the text to the clipboard. Switch to the Word document and paste the content of the clipboard. The math is detected and converted to Office Math.

Note that copying the same expressions with MathCAT copied as MathML are pasted into Word successfully.

@brichwin
Copy link
Collaborator

brichwin commented Sep 12, 2024

I verified that this is happen as described above, but with an added layer of complexity.

Word will not recognize MathML with the xmlns as provided on that page. However, even on a page that uses the correct xmlns copying from MathJax fails. It appears that Word will not recognize even the best formed MathML when the clipboard provides both the text/plain and text/html DataTransfer mime types of the MathML string. In my tests, successful conversion to Office Math only occurred when the clipboard object had only the text/plain DataTransfer type.

Discovery Process

The version of MathJax on the The Lorentz Transformations page (MathJax 2.7.1) simply passes through the xmlns value from the page's original MathML source. After discovering that the page's MathML's xmlns leaves off the protocol from the URL, I decided to try it on a page which provides the fully formed (recommended) xmlns value as described in the MathML3 spec at [2.1.2 MathML and Namespaces] which is:

<math xmlns="http://www.w3.org/1998/Math/MathML">

In my tests, even MathML with a fully formed correct xmlns value when copied from MathJax fails to convert into a built-up Office Math expression. Instead, the MathML is rendered as text into the Word document. After doing experimentation and checking the result of pasting the exact same MathML from MathJax or from Notepad into a clipboard inspector, I discovered the DataType mime value differences.

If I strip the text/html DataType from the clipboard by copying from MathJax, pasting into NotePad, recopying the unchanged text from Notepad, and then paste it into Word, then it successfully converts to a built-up Office Math Object. Alternately, content copied from MathJax's "Show Math as" dialogs will convert directly if the Word Paste > Keep Text Only option is used.

Thoughts

When MathML is in an IDE like the Visual Studio Code editor, the editor will syntax highlight the MathML (xml) by placing tagnames, attributes, attribute values, child text, etc. in various colors. That "beautified" MathML code when copied from Visual Studio Code and pasted into MS Word retains the background and foreground colors. This would be considered by many to be a valuable result and appears to depend upon the text/html DataType.

Does being able to paste in pretty MathML outweigh making copying and pasting math into Office Math objects easier?

Instead of asking Microsoft to always convert well-formed MathML content into an Office Math object, I wonder if it is best to:

  1. File bug reports on MathJax to have them place only the text/plain DataType of the MathML string onto the clipboard (this is easily doable). Note: This is unlikely to fix the vast number of pages already using MathJax in the wild. That Lorentz site uses MathJax 2.7.1 which was released in April 2017. Few sites have moved past the 2.7 versions of MathJax.
  2. Have MathJax consider always adding the proper xmlns to the element in the copied MathML string.
  3. Document that users can use the Paste Special or Paste > Keep Text Only options to work around to pasting MathML copied from "rich text or html" sources.
  4. Ask Microsoft to ignore the xmlns attribute on well formed text/plain MathML string clipboard content altogether and when only the text/plain version is supplied to attempt converting it into Office Math.

It's frustrating this is so complex. I suspect many would not realize that copying from a code editor like Visual Studio Code isn't copying plain text, but a "rich text" HTML version instead.

@ways2read
Copy link
Member Author

Colleagues at MSFT responded:

We are still trying to narrow down this. Is this more a dupe of "MathML Copied from JAWS 2024's Math Viewer with MathCAT Will Not Auto Convert Into Math Objects in Microsoft Word"
If the MathML has all the formal markup, Word seems to respond correctly. It seems up to the source of copying the MathML content into the clipboard that has the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants