Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U+002D HYPHEN-MINUS in <mo> operators #70

Open
fred-wang opened this issue Sep 23, 2019 · 20 comments
Open

U+002D HYPHEN-MINUS in <mo> operators #70

fred-wang opened this issue Sep 23, 2019 · 20 comments

Comments

@fred-wang
Copy link
Contributor

U+002D HYPHEN-MINUS is too short, so MathML browser implementations render it as U+2212 MINUS SIGN.

Do we want to standardize this workaround?

I'd prefer not, but I guess the proper way to do it would be via a new text-transform value for mo.

If not, tools should generate the proper code point instead and we can write a polyfill for that.

@davidcarlisle
Copy link
Collaborator

the unicode name suggests that this character is usable (on input) as minus and I would guess a very large percentage of existing mathml uses - rather than U+2212 as minus (including the examples of subtraction in the MathML3 spec)

so if it does not break the code design too much I think it would be good to support this in core, although as you say a polyfill could do the replacement if that is really needed.

@NSoiffer
Copy link
Contributor

There are a number of characters that should render the same. These are listed in chapter 7. For this case, it says "MathML renderers should treat U+002D [HYPHEN-MINUS] as equivalent to U+2212 [MINUS SIGN] in formula contexts such as mo, and as equivalent to U+2010 [HYPHEN] in text contexts such as mtext."

Some equivalents not covered in chapter 7 (and maybe something we should add to the spec) are things like - and _ being rendered the same as (stretchy) lines in mover, etc.

@MurraySargent
Copy link

Math instances of U+002D should be displayed as U+2212. This should be done by converting U+002D to U+2212 when reading in MathML or another file format. Similarly math instances of U+0027 (' apostrophe) should be converted to U+2032 (′ prime). The OpenType ssty feature should not be required for these changes. At least that's how it works in OfficeMath (Word, PowerPoint, OneNote, etc.)

@fred-wang
Copy link
Contributor Author

I agree with Murray that people / authoring tools should use the proper glyph (so U+2212 instead of U+002D, or U+2032 instead of U+0027). The question is whether we want to handle backward compatibility for this kind of "bad markup" as clearly there are existing content doing it. My feeling is that we don't want to add this ugly hack in level-1 since the goal is to have a clean spec as a starting point. Maybe that can encourage people to migrate their pages / tools. If we do this in the future, I'd prefer to standardize this at a CSS level like text-transform.

ssty,is to handle script things at a font-level, but this is not about script and existing fonts don't provide these transforms, so it's irrelevant here. See https://github.com/mathml-refresh/mathml/issues/19 for a separate discussion.

@faceless2
Copy link

faceless2 commented Jun 24, 2020

While I understand the sentiment and the desire for a clean spec, I think conversion from U+002D to U+2212 in particular is fairly critical. Of the testcases we've been working from I don't think a single one uses U+2212, and before we added this substitution, the results were noticeably incorrect.

If MathML3 specified a lot of these types of substitutions I would certainly back moving it to something like text-transform. But for this single substitution (or both if the less common U+0027 to U+2032 is included as well) the pragmatic - if not the cleanest - approach is to specify this conversion takes place explicitly.

The alternative is you either define this behaviour somewhere else (ie css, or a polyfill) or have close to 100% of legacy MathML content render incorrectly.

@NSoiffer
Copy link
Contributor

Other characters that have similar issues:

  • As already mentioned (apostrophe) for U+2032 [PRIME] along with 2,3, or 4 apoostrophes for double prime, etc. However, apostrophe can also be used for minutes or feet, so this may be problematic
  • ASCII | for divides (other vertical lines?)

In addition, there are a number of characters that occur in under/overscripts that currently aren't specified but need to be:

  • ‘_’(other chars?) for over/under bars?
  • What char to use for hat, etc? Probably '^' and not U+0302 (Combining Circumflex Accent).

@NSoiffer
Copy link
Contributor

@fred-wang: you removed the 'need resolution' tag without specifying a resolution. What is the resolution?

@fred-wang
Copy link
Contributor Author

This is very low priority, so I removed the label as I thought you wanted to use this label to prioritize what needs to be discussed in meetings. As said above, we are definitely not going to do this hack for a first implementation and so as agreed by our process this shouldn't go into mathml core level 1.

The cases mentioned on https://github.com/mathml-refresh/mathml/issues/146#issuecomment-661241874 are even less important (and separate from this bug report), no browsers do that kind of substitution so there is no backward compatibility risk. I would personally oppose doing this for any version of MathML Core.

I think the right thing to do for now is to write a polyfill for minus and to urge people to update their tools/documents to use the proper code point.

@davidcarlisle
Copy link
Collaborator

It is really baffling why you see this as low priority not supporting it means that essentially no existing mathml will work unchanged as mathml core.

It is not at all clear that U+002D is not "the proper code point" it is hyphen-minus, that is, its intended use is as a hyphen in text and a minus in math, which is how it has always been treated in MathML so far.

@fred-wang
Copy link
Contributor Author

@davidcarlisle I just tried a basic testcase $$-$$ in LaTeX (hyphen-minus) and the character in the pdf output is U+2212 MINUS SIGN so hyphen is not used as a minus sign. Ideally tools generating MathML content (converter, WYSYWYG etc) should do the same. If you are talking about how people typeset the math with the keyboard, then that's not a topic for MathML Core which is focusing on browser rendering. Changing a character between DOM and rendering (and so possibly its semantics) was already controversial for text-transform / mathvariant / single-char-mi and that caused hot debates in the initial CSSWG discussion last year. Since level 1 is focusing on a clean spec, introducing another hack does not seem appropriate at all.

I understand people can disagree on what is important for MathML, but I really wish we agree on the principle followed for the development and implementation of MathML Core. I'm really disappointed that some people still seem to follow MathML3's approach "put something in the spec so that it get magically implemented in browsers" (and even worse putting pressure on others to do the job). Anyway, I'm tired of repeating the same thing again and again and I don't want to waste time arguing about this, so I'll stop here.

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Aug 21, 2020

In classic latex you certainly wouldnt get U2212 but that is misunderstanding my comment. Even without math, unicode input is expected to go through all kinds of font shaping so the glyphs in the output don't match the input. A unicode input of - is expected to make a minus sign most likely rendered using a glyph at position 2212 if used in math. This has been supported by every MathML system so far including the one in Office, and mathjax and existing browser implementations.

Why is this different from <mi>x<mi> rendering as U+1D465

I understand people can disagree on what is important for MathML, but I really wish we agree on the principle followed for the development and implementation of MathML Core. I'm really disappointed that some people still seem to follow MathML3's approach "put something in the spec so that it get magically implemented in browsers" (and even worse putting pressure on others to do the job). Anyway, I'm tired of repeating the same thing again and again and I don't want to waste time arguing about this, so I'll stop here.

That completely mis-represents the discussion.

Using - is interoperably supported by existing mathml systems and used in the overwhelming majority of existing mathml content. It clearly meets the criterion for being included in MathML Core.

@MurraySargent
Copy link

MurraySargent commented Aug 21, 2020 via email

@NSoiffer
Copy link
Contributor

I did a check of some popular TeX-to-MathML converters: tex4ht, ltlatex , latexml, and even @fred-wang's own TeXZilla produce the ASCII minus. Even if they were all changed, that leaves all the MathML that has been produced by them over the years as having the ASCII minus in the MathML.

@NSoiffer NSoiffer transferred this issue from w3c/mathml Jun 29, 2021
@MurraySargent
Copy link

MurraySargent commented Jun 29, 2021 via email

@fred-wang
Copy link
Contributor Author

I think there are confusions in the this thread. Editors are free (and probably should) to replace any typed U+002D (-) with U+2212 (−), that's out of the scope of MathML Core. The question is about whether we want to introduce a hack (e.g. based on text-transform) in browsers with all the extra issues it opens (more exceptions in the code, more tests neeed, text mistatch between DOM / rendered / ATs / copy & paste, etc).

In any case, there is no plan to integrate such a hack in Chromium's initial implementation so I guess this should be level 2.

@NSoiffer
Copy link
Contributor

There is no confusion in my mind -- this is a requirement for the spec, not editors. Existing MathML and existing MathML producers mostly use U+002D and expect it to render with the U+2212 glyph.

Here's the difference illustrated with a trivial codepen rendered by Chrome with math support:
image

I don't think the spec should say how to implement this equivalence. It should merely say that U+002D should be rendered as U+2212. I'm not convinced that text-transform is the only way this can be done. What I am convinced is that not implementing this equivalence is a significant change from people's expectation and current use.

@faceless2
Copy link

I completely agree with Neil. At the absolute least you need some sort of statement of intent - it is expected that user-agents will convert U+002D to U+2212 for both rendering and the AT tree via some undefined mechanism, even if it's marked as optional, would go some way to limiting the inevitable divergence when implementers are given existing MathML docs, and the existing MathML users that go with them, and a spec that's silent on what to do about it.

(I say inevitable with some confidence, because we've implemented this substitution).

@davidcarlisle
Copy link
Collaborator

I think (whether or not you can implement this in the first implementation) that core should say that - should render as minus, the majority of existing MathML assumes this, including all instances of subtraction in the MathML3 spec and almost all existing generators eg tex to mathml convertors. I don't think you can brush this off as "confusion" on the part of the commenters or that it is a "hack". The - symbol in Unicode is explicitly dual use HYPHEN-MINUS and should act as a hyphen in text and a minus in math.

@bkardell
Copy link
Collaborator

Discussed in the meeting today, no resolutions. We'll circle back on this next month after we have some more async discussions.

@NSoiffer
Copy link
Contributor

Adding needs-spec change label as a meeting agreement was this must be done due to the vast amount of legacy MathML that assumes this equivalence. Both Firefox and Safari support this. If Chrome can't handle this, it should fail a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants