Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding semantics to presentation MathML using symbol names #141

Closed
samdooley opened this issue Sep 19, 2019 · 12 comments
Closed

Adding semantics to presentation MathML using symbol names #141

samdooley opened this issue Sep 19, 2019 · 12 comments
Assignees
Labels
accessibility Issues related to improving accessibility intent Issues involving the proposed "intent" attr MathML 4 Issues affecting the MathML 4 specification

Comments

@samdooley
Copy link
Contributor

Several options for adding semantics to presentation markup were discussed on the Sep 10 MathML General call. A common thread seems to be a need for a shared vocabulary of mathematical symbols/operators/names.

https://docs.google.com/spreadsheets/d/1ebOkl7Gckfk5g6Dc4C8bpGZtSxLnGwpOHqAwwON0-nI/edit?usp=sharing

I have collected 1749 symbols into a Google sheet as initial starting point for such a list. The list still needs lots of work, but enough is there to illustrate how one could add semantic information via a role attribute to encode content markup within the presentation markup:

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mrow>
      <msup role="power">
        <mi role="ci">a</mi>
        <mn role="cn">2</mn>
      </msup>
      <mo role="plus">+</mo>
      <msup role="power">
        <mi role="ci">b</mi>
        <mn role="cn">2</mn>
      </msup>
    </mrow>
    <mo role="eq">=</mo>
    <msup role="power">
      <mi role="ci">c</mi>
      <mn role="cn">2</mn>
    </msup>
  </mrow>
</math>

The goal for this list is to define unique short identifiers for as many mathematical symbols as possible from widely used sources, including Unicode, Content MathML, LaTeX, Nemeth Braille, and SI units.

These identifiers are intended to be suitable for use in as many markup contexts as possible, including presentation MathML role attributes, content MathML element names, LaTeX macro names, and JSON property names/values.

Each row in the table defines a single symbol, with its unique identifier (ID), a short description (Symbol), a mnemonic example (Example), and a Unicode character (Unicode).

The symbols are listed by type, which gives a rough classification of the symbols according to their syntactic form: symbol, operator, unit, function, large operator, special forms, fences, and scripts.

While the universe of math symbols is necessarily unbounded, this list should include the more common Unicode math symbols, the Content MathML 3.0 element names, the Nemeth braille patterns, and the more common SI units.

This first version is missing lots of symbols, and I know I need to check for coverage for content MathML elements, and braille patterns. But let me know if there are vocabularies that deserve special attention. Statistics, chemistry, and multi-variable calculus, for example, could clearly use some work, among others.

@samdooley samdooley self-assigned this Sep 19, 2019
@fred-wang
Copy link

Just FYI, I think this is very similar to issue #64 (there is also #9 for a more generic native a11y implementation issue).

@davidcarlisle
Copy link
Collaborator

It might be worth cross referencing against the OpenMath list especially as all the Content MathML element names are already cross referenced to OM.

https://www.openmath.org/symbols/

@davidfarmer
Copy link
Contributor

davidfarmer commented Oct 7, 2019 via email

@NSoiffer
Copy link
Contributor

NSoiffer commented Oct 8, 2019

@samdooley: that's a long list, so thanks for all the effort and getting the ball moving along!!!

My comments:

  • When used as a role, I think the current (ARIA) plan requires a prefix. Maybe we go with "math" or maybe "mml"? So in your example, it would be something like <msup role="math-power">.
  • When using roles in ARIA, one should only use them when the natural semantics need changing. For example, you don't write <h1 role='h1'>. Along this lines, I don't think adding roles to mi and mn adds anything in the following:
    <mi role="ci">b</mi>
    <mn role="cn">2</mn>
  • Taking a step further along the same thought process, I don't think adding a role to a token element that is a single character adds anything unless there are multiple interpretations for that element. E.g, what value is there for using role in <mi role='math-alpha>α</mi>? You don't list any other meanings for α, so U+03B1 already uniquely determines its meaning.
  • Your table does have some characters that have multiple meanings, but those are hard to find. Since I don't feel it is useful to give ids to characters with only one meaning, I think it would be useful to delete all of those. That would make the other ones obvious
  • For some with multiple ids, I'm a little dubious of the current list. An example I spotted is !. It has ids bang and factorial. I'm dubious about bang. What a math example that uses that? (if in mtext, it shouldn't be interpreted). Potentially I could see not as a meaning in a prefix setting.
  • I'm dubious about the need for four different ids for integrals (222C) and similar large ops. In presentation MathML they are easily distinguished because they are in msub, msup, or msubsup. If different forms are needed, don't you also need them munder, mover, and munderover? Maybe even multiscript...

I hope these comments are helpful to start a discussion on your list.

@NSoiffer
Copy link
Contributor

NSoiffer commented Oct 8, 2019

@davidfarmer -- thanks for the list. It seems there are a few things that break "the Soiffer hypothesis", but not many. If the computer could know what the functions are, then distinguishing between function application and multiplication would not be needed. But doing that requires reading the text and can't really be known by just knowing the subject area. So putting that aside (which I don't really thinks breaks my hypothesis), I see the following as problematic:

  • ( ... ) either point in plane or open interval.
  • {1, 2, 3, ...} either sequence or set
  • I don't think you included f^(4)(x) which could potentially be confused with power. Especially if you wrote `f^(n+1)(x). I suppose knowing that it is functional application would be a good clue that it wasn't power, so maybe this isn't problematic...

Is there more to add to that list?

I don't think it is hard to distinguish between definite and indefinite derivatives, but maybe that's just me as @samdooley makes a distinction in his list (but he also calls out more distinctions).

Note: probably most people in this project can read TeX, but I strongly suspect some people have trouble reading it. For your examples, it would probably be helpful if you included images showing the notation in 2D form.

@davidcarlisle
Copy link
Collaborator

I think the lists are a useful starting point for assigning roles, although I'm a bit confused about the TeX-centred description. I don't think we should be specifying a TeX syntax in this group. We should be assigning roles to use on mathml elements. Individual systems or individual users can define tex macros to produce that markup, but as that is just surface syntax that's expanded out by tex or javascript or whatever, I'm not sure it need be standardised.

I'd agree with Neil that integral forms can be distinguished by the presence of limits (that is, the integral operator is wrapped in msubsup or msub) so I'm not sure that more specific roles are needed for integrals.

Also in @davidfarmer's list I'm slightly sceptical that authors will want to use prefix forms for invisible times and function application (content mathml as an author format suffers from this) since the presentation forms are infix, I think a tex infix markup like 3 \invisibletimes x is easier to map to <mn>3</mn><mo>&InvisibleTimes;</mo><mi>x</mi>

If we do use TeX markup for symbols in any descriptions I think we should use the unicode-math markup (as that works in tex) these are all listed in unicode.xml in our git repository, for instance

<mathlatex set="unicode-math">\oint</mathlatex> \oint for ∮

and

<mathlatex set="unicode-math">\mbffrakA</mathlatex> \mbffrak for 𝕬

in particular we shouldn't use commands like \bf (which is not defined by default in LaTeX).

@davidfarmer
Copy link
Contributor

davidfarmer commented Oct 8, 2019 via email

@davidfarmer
Copy link
Contributor

Below is a proposal for how to reorganize Sam's tables.

The underlying problem is similar to what one encounters when designing a database.

"multiplication" can be represented by \cdot, \times, or [space]

\times can mean "multiplication" or "cross product"

Thus, we have a many-to-many relationship.

An additional complication is that Sam wants to encode both the form and
the meaning, so that he can convert between different representations of the
content (presentation MathML, content MathML, and his editing program).

The conclusion I reach is that two attributes are needed: one that encodes meaning,
and one that encodes presentation. Note that some may argue that in many cases
it is not necessary to encode presentation (because the content is the representation),
but recording the presentation with an ASCII id can be useful.

So we need both the current "ID" column in Sam's table, and also another new
column for "Meaning". I propose that specific column name so that is is clear
to us what should go there. (Probably we need two tables,
one which has the ID column and the other much smaller table has the
Meaning column.

In the HTML, the Meaning will be recorded in the 'role' or 'math-role' or some other
attribute to be determined later. We don't need to know that in order to develop
the table. The ID can go in the HTML as a 'data-id' attribute. In HTML it is always
legal to have an attribute beginning with "data-". It is okay if Sam uses the data-id
attribute and others do not.

@samdooley : I tried to email you, but it bounced. Has your email address changed?

@davidcarlisle
Copy link
Collaborator

Trying to understand @samdooley' s spreadsheet before the call, we seem to have been talking round it for a while with I think two viewpoints leading to a certain amount of disconnect so I tried refactoring it

I started by removing all rows that did not share an entry in the Unicode column (F) as they are uniquely identified by their presentation mathml markup, then further removed any rows if there were not multiple meanings after removal of synonyms such as ngt ; notgt,

I then did a bit of hand cleanup and ended up with the 29 entries in the attached table (html but attached as .txt for this site)

sym2.txt

Note I blanked out all the units rows to a single UNITS row as I think we do need to specify some markup for (any) unit use.

Note I think the data in the original table is needed, just not in the mathml spec, that is, if you are inferring semantics (or content mathml or openmath) from presentation and hit a U+222D then you need to know that's a triple integral. Sam's spreadsheet has that data and any convertor (in either direction) needs that information, which is where it came from:-) but I would argue that it should not be in the MathML you should only put a (math)-role="wibble" on an <mo>&#x222d;</mo> if it is not standing for a triple integral.

@fred-wang fred-wang added accessibility Issues related to improving accessibility MathML 4 Issues affecting the MathML 4 specification labels May 22, 2020
@dginev
Copy link
Contributor

dginev commented May 8, 2022

I just discovered this issue today and am particularly interested in @samdooley 's list:

I have collected 1749 symbols into a Google sheet as initial starting point for such a list.

It appears that Google doc has disappeared. Is there a new location?

@davidcarlisle davidcarlisle added the intent Issues involving the proposed "intent" attr label Jun 17, 2022
@NSoiffer
Copy link
Contributor

@samdooley: is the list still around so @dginev can look at it. If it isn't, please close this issue.

@NSoiffer
Copy link
Contributor

No action, so closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accessibility Issues related to improving accessibility intent Issues involving the proposed "intent" attr MathML 4 Issues affecting the MathML 4 specification
Projects
None yet
Development

No branches or pull requests

6 participants