Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path for adding semantics #84

Closed
davidfarmer opened this issue Mar 29, 2019 · 11 comments
Closed

Path for adding semantics #84

davidfarmer opened this issue Mar 29, 2019 · 11 comments
Labels
accessibility Issues related to improving accessibility

Comments

@davidfarmer
Copy link
Contributor

This is related to issue #64 but is more about the pipeline
from source to MathML in the browser.

I believe it is possible to have many authors use a standard
set of semantic LaTeX macros. Well, maybe not most authors,
but a significant subset of those who write in PreTeXt
(https://pretextbook.org/) . This includes some popular
textbooks.

For example, they would use \abs{x} for the absolute value
of x, instead of writing |x|.

There would be a repository which contains information about
that macro. For example, "in LaTeX the macro has the definition
\newcommand{\abs}[1]{|#1|} "

But more significantly (this is the point of this issue)
the repository also has information such as:

Pronounce this as: "absolute value of #1" or "begin absolute
value #1 end absolute value" or whatever is the right thing
to say, possibly with variations.

Write this in braille as: [whatever it should be]

What I would like to see happen is: author writes their source
using the standard macros. Once converted to HTML (say, using
MathJax to convert the math), a screen reader makes the correct
pronunciation of $|x|$ without the need for any guessing
about what "vertical line x vertical line" means. The key to
making this happen is the repository associated with the macro.

Is it reasonable to hope that things will be able to work
that way? I am pretty sure that authors can be persuaded to
write their books with standard macros, since it is not really
any extra work, and the benefits would be significant.

@davidcarlisle davidcarlisle added the accessibility Issues related to improving accessibility label Mar 30, 2019
@bkardell
Copy link
Collaborator

I'm still a little new on this so I apologize if this seems remedial or something but let me ask anyways since it is inline with some things I want to ask in the next meeting...

Once converted to HTML (say, using MathJax to convert the math)

This seems to imply a number of kind of key things -- MathJax, as you say, ultimately converts to HTML. Lots of things could potentially create 'mathy' HTML that involves no actual <math> element. Part of what we are doing is trying to describe the underlying 'plumbing' of MathML and the platform so that we solve both ends of that problem.

So... It seems then what to some extent you are desiring "new magic" here that isn't about MathML as much as the platform capabilities/architectures themselves? That's not a statement as much as a question trying to help me understand: Where does this role play in specifically? If there was no MathML, where would this 'fit'?

@davidfarmer
Copy link
Contributor Author

davidfarmer commented Apr 22, 2019 via email

@davidcarlisle
Copy link
Collaborator

I think that this is essentially a duplicate of #64.

Of course if we define a system of roles, there has to be a pipeline from some authoring enviornment (including conversions from latex) that adds the roles in the right places, but in general the mathml spec can't say a lot about how the mathml is created, so perhaps a TeX fragment of \abs(x) ends up as <mrow role="absolute value"><mo>|</mo><mi>x</mi><mo>|</mo></mrow> or whatever the final specification of role= syntax ends up being, but mathml can only specify the role (and its effect on screen readers) not really the conversion from TeX.

@bkardell
Copy link
Collaborator

Sorry if I am only adding confusion and not help here - I'm not sure exactly... but.. are there efforts in aria to define said roles we would need? if so, is it worth linking up here?

@fred-wang
Copy link

@davidfarmer I think Neil or others can better comment on the status and plan to add semantic information. As @davidcarlisle noted, there are discussions to add more semantic via roles. However, I think your comment is interesting and actually goes further. In any case, I think this discussion has a place in this CG and probably the ARIA WG should be involved too.

Quoting Abraham Nemeth from the book "Braille into the next millenium":

The Principle of Meaning Versus Notation: In my view, it is the transcriber's function to supply only notation, not meaning in an accessible form (speech or braille). It is the reader's function to extract the meaning from the notation the transcriber supplies. Consider the common notation (x,y). That notation can mean many things: the ordered pair whose first component is x and second component is y ; the point in the cartesian coordinate with abscissa x and ordinate y; the open interval on the real line with left endpoint x and right endpoint y; or the greatest common divisor of x and y. The transcriber's function, however, is only to convey this five-symbol expression to the reader. It is the reader's function to extract whatever meaning his experience and the context of the text permit.

Many people disagree with Abraham Nemeth here and want to add information to describe the different meanings (pair, point, interval, gcd...) in order to make formulas less ambiguous and "readable". However, math notations are really open-ended and it's quite common for authors to introduce their own very field-specialized notations at the beginning of papers. As I see, your proposal has the advantage to let authors explicitly provide the way they want the notations to be read, instead of relying on a fixed set of known definitions (if I understand #64 correctly).

One possible limitation though: Are people really going to define macros for pair, point, interval, gcd etc? I suspect authors would just write $(x, y)$. To take something even more basic, consider exponents: In calculus, $\sin^n$ and $\sin^{(n)}$ have different meaning. In set theory, $\mathbb N = \omega = \aleph_0$, but $2^{\mathbb N}$ is a set of functions, $2^\omega = \omega$ (ordinal exponentiation), and $2^{\aleph_0} &gt; \aleph_0$ (cardinal exponentiation). And obviously, there are tons of other notations with superscripts and subscripts. If introducing macros does not make thing more concise, I believe people would just write ^ and _.

Anyway, this is just my two cents on this... I think I would be happy if we had at least a standard and cross-platform way to read presentation MathML that addresses Nemeth's minimal need. Currently we even don't have this minimal support AFAIK.

@davidfarmer
Copy link
Contributor Author

I am hopeful of implementing these ideas for the math that
is commonly taught up through the first or second year of
college.

The ingredients are:

  1. A modified form of LaTeX, which is meant to be human-readable,
    human-writable, and semantic. (Example below.)

  2. A script that converts 1) to another form of LaTeX which
    is not intended to be written or read by humans, but which
    preserves the meaning of the original source. This form
    uses what below are called "semantic macros".

  3. Explicit rules for how to pronounce the semantic macros.
    (There are multiple options for each macro, ranging from
    verbose and extremely precise, to brief. People familiar
    with the subject want the brief version. Experts provide
    these rules. The examples below (made up by me) are not good,
    but I hope they illustrate the point.)

  4. LaTeX definitions of the semantic macros, which describe
    the visual appearance of the output. (Individual users are
    free to change these definitions. You want \transpose{A}
    to put the "T" on the left instead of the right? No problem:
    just change the macro.)

  5. What is missing, and I think it is the point of this issue,
    is whether all this information can be accommodated in
    MathML. The alternative is that MathML just shows the
    appearance, and pronunciation is handled in another way.

Note that this is about pronunciation, not Braille, We could
also make a version of the semantic macros that outputs Braille.

The context in which I am sure I can accomplish 1)-4) is
open source textbooks, particularly those written in PreTeXt.
If the author has been reasonably consistent, then it is
possible to write a throw-away script that converts their
source to a structured form. There are not many good
open source math textbooks, and I could imagine myself
converting all of them to semantic form. Those books would
reach many students.

  1. Here is an example of structured LaTeX markup:

    If $f:\R \to \R$ is (strictly) decreasing then
    $$
    \sum_{n=1}^A f(n) \ge \int_1^{A+1} f(x) dx
    $$
    and if $x \in interval[-10, -3]$ then $f(\abs{x}) &lt; f(x)$.

Differences you might notice from typical LaTeX:
the \interval tag to indicate that [-10,-3] represents
an interval, the (unambiguous!) macro \abs{x} instead
of |x|, the macro \R for the real numbers, and good use of
white space for clarity.

  1. Here is the same text, now using semantic macros:

If $\functionDomainCodomain{f}{\reals}{\reals}$ is (strictly) decreasing then
$$
\sumLimits{n=1}{A}{\functionApply{f}{n}} \ge \definiteIntegralLimits{1}{A+1}{\functionApply{f}{x}}{x}
$$
and if $x \in \intervalCC{\minus{10}}{\minus{3}}$ then $\functionApply{f}{\absoluteValue{x}} &lt; functionApply{f}{x}$

  1. Here is the pronunciation of those semantic macros
    (note that I am new to this area, and surely an expert
    could improve these. I am just trying to convey the idea.
    Also, there needs to be multiple pronunciations, from
    verbose to concise. Examples below are verbose.)

\functionDomainCodomain[3] function #1 from #2 to #3
\reals the real numbers
\sumLimits[3] the sum from #1 to #2 of #3
\functionApply[2] #1 of arg #2 end arg
\definiteIntegralLimits[4] the integral from #1 to #2 of #3 dee #4
\intervalCC[2] interval including #1 to including #2
\minus[1] negative #1
\absoluteValue[1] absolute value of arg #1 end arg

Unfolding all of those pronunciations gives the
pronunciation of the expression. My main question:
can such a pronunciation be obtained from attributes
on the MathML?

An enhancement I plan is a "simple argument" version of
\functionApply: If the argument is "x" then the arg ... end arg
is not necessary. The script which converts structured LaTeX
to semantic LaTeX can determine which one is appropriate.

  1. The LaTeX definitions of the semantic macros are what
    are used to produce the visual output, either by MathJax
    or by some other program that converts to MathML
    (or whatever other method is used to make the visual display).

One example of the semantic macros:

\definiteIntegralLimits[4] \int_{#1}^{#2} #3 \measureD #4

\measureD ,d

Note that those macros address the LaTeX shortcoming that
authors need to micromanage the layout. In the source,
the "\int" and the "dx" are recognized as parts of one
object, which is then interpreted appropriately.

The publisher is free to redefine the semantic macros.
For example, if you want the "d" of "dx" to be upright,
then change the definition of \measureD. The semantic
input, and the pronunciation, will not change.

Another example is \functionApply. I will not give
the definition (which I learned from Alex Jordan), but it
addresses the issue that $f\left(\frac12\right)$ does not
look good, because there is too much space after the "f".

The above is supposed to illustrate the possibility of
writing semantic source, and not losing that information
along the pathway to displaying that material in the
browser, enabling screen readers to pronounce that
material without any heuristics or guesswork.

@fred-wang
Copy link

Thanks for the detailed explanation. I think restricting to a subset of math taught at college + asking authors to always use explicit macros addressed my concerns. For the missing bit, I personally don't know if there is any standard way to tell screen readers how to pronounce the text so I'll let others reply.

@NSoiffer
Copy link
Contributor

Sorry for the slow response -- I'm catching up after a month long (great) vacation... Also, this is a very long comment that has taken a while to write

For me, a goal of MathML 4 is to allow, but not require, some semantic enrichment of presentation MathML. The primary use case I see is for accessibility, but I'm sure that are others such as computation. I like @davidfarmer's ideas for using more semantic LaTeX and it fits in well with my goal of finding a way of putting that into the MathML. We might want to allow for explicit text or braille, but doing so has a number of drawbacks:

  • @davidfarmer already eluded to a few problems such as wanting a terse reading or a verbose reading.
  • people with different disabilities need to hear different things. Someone who is blind needs to know where a fraction starts and ends. E.g, "fraction, a plus b, over 2, end fraction". However, for someone who is dyslexic or has ADHD, "fraction" and "end fraction" are likely distractions and make it harder to understand the math.
  • you can't add prosody, in particular pauses, to make the math speech more understandable. Nor can you force the long "a" sound in English to make the variable "a" be pronounced properly (text to speech engines are tuned for English, not math).
  • synchronized highlighting of speech and math is very useful to people with dyslexia. You lose that with plain text; the generated output could embed some way to indicate what part corresponds to what child. I did this with MathType + MathPlayer for the ClearSpeak style, but it adds complexity for both the text author and to the program reading the text. Another alternative is to put the text on every descendant, but I'm not sure that really solves the synchronized highlighting problem; it does help for navigation.

IMHO, providing explicit text should be done only in exceptional cases (e.g, a test where you don't want to hint at answer). Instead, it would be better to embed the meaning and let the renderer choose the proper speech based on user preferences. This is similar to saying the visual renderer shouldn't use an image but instead should render the math to match the font size (etc) that the user has chosen. MathML provided a fallback altimg for math renderers that couldn't render MathML, but that never got implemented by browsers so was rarely added to the MathML. MathML also allows alttext on the math element. Potentially this could be expanded to be allowed on all MathML elements. However, that means there is only one allowed text; no alternative ways to speak the math, no other languages allowed.

Note that some of these problems exist for braille also. Nemeth was the standard for braille in the US and some other countries for many years. Recently, Unified English Braille (UEB) has come along which defines its own math codes. Most English speaking countries have adopted UEB; the US allows both. It's a very contentious topic because UEB uses many more characters than Nemeth to encode math. If the braille is author generated, both would need to be included in the MathML somehow. Because braille is syntax-based, I believe it can be generated from presentation MathML for both codes. I wouldn't be surprised if there were one or two problems areas that should be fixed for good braille generation from MathML, but I haven't heard of them yet. MathPlayer uses liblouis for its MathML to Nemeth conversion and although there are bugs in the conversion, AFAIK, none are due to a problem with MathML. So far, MathML to UEB in liblouis is very incomplete.

What I like more (or in addition to text) is providing a way to embed semantics. Some options are:

  • An existing way to do enrichment is to use parallel content markup. A big positive is that this already exists. The problem is either the limited number of known math contents with "full content MathML" or the very big complication of having to be able to read and produce speech from OpenMath/"strict content MathML". @AdamSobieski in Adding "semantics" to presentation MathML via "roles" #64 has some related ideas.
  • Add an attribute such as mathrole as in Adding "semantics" to presentation MathML via "roles" #64. This would be an open ended set of values, but potentially a core set of several hundred could be agreed upon and updated periodically. Those apps that care about semantic enrichment would know what to do when they encountered the known name, whether that be generate speech for someone who is blind or convert it to an appropriate command for computation in their system. If we went this route, coordination with PreTeXt in terms of at least names would be good. I'm not sure that would work with my suggestion in Adding "semantics" to presentation MathML via "roles" #64 of potentially using WikiData.org as the source for the names though. Maybe the PreTeXt macros could refernence the WikiData.org keys and whatever translator converted the PreTeXt math to MathML would use that.
  • Same as above but have a way of looking up the meaning/text. This is trickier and conceptually duplicates some of the efforts of OpenMath. Wikidata.org keys might be useful as an alternative to OpenMath.

As mentioned in #64, we should talk to the ARIA WG after we have some proposal or set of questions to ask them. I hope I'm correctly representing ARIA in the following simplified description of what it is...

When a web page is read, the DOM is created and from the DOM, and simplified view of the DOM called the accessibility tree is created. HTML elements map to various things in the accessibility tree; ARIA provides a way to override those mappings. Those are particularly useful for divs and spans since they don't have mappings to the accessibilty tree. Perhaps the most important mapping is to the accessible name of an element in the accessible tree. Screen readers typically use that name (which might be the concatination of various other names of DOM elements) as the text that is spoken. The names are plain strings.

If we are thinking of having screen readers directly read MathML, it would be by providing a means for MathML elements to set the accessible names in the accessibility tree. In some sense, that's already possible because you can add aria-label to an element and that will override/set any text for the name. Just as I'm not a fan of authors setting the text, I'm also not a fan of this although potentially the text could be generated by client software based on user preferences. Math has it's own braille code, so braille is lost. There is a suggestion to add a new aria feature aria-label-braille that would provide braille, so that would remedy that problem. Synchronized highlighting remains a problem though. Perhaps we should be thinking about what to add to ARIA to solve that?

Alternatively, maybe we should be looking at other ways to allow screen readers, or more likely, third party libraries that screen readers can call such as Volker Sorge's SRE, to produce speech better. Enchancing ARIA's role is one possibility, but to me, adding a bunch of math related things "pollutes" that attribute. As I mention earlier, mrole or mathrole on MathML elements makes more sense to me.

Whatever we come up (potentially more than one idea), once we have discussed it and come to some conclusions, we should bring those to the ARIA WG for their feedback.

Finally, I want to digress a little and explain why Dr. Nemeth wanted syntax, not semantics when he heard speech. I had the privilege of meeting him when he was 92. Although blind (and 92), he traveled by himself via plane to a workshop I was at. He was as sharp as anyone in their 20s and had a keen sense of humor also. However, the technology he used was not modern. As he explained, he developed MathSpeak as a one-to-one way of speaking his Nemeth braille code (which encodes syntax, just like sighted math notation). There were two reasons for his design of MathSpeak:

  1. he couldn't trust his readers as they were not necessarily math-literate, and
  2. having it in braille made it much easier for him think about and remember what was said. He used a braille writer for that so if the person spoke MathSpeak to him, most every word corresponded to a braille symbol. This made it easy to type what he heard. Back then, and until a few years ago, there was no option of math automatically showing up on a refreshable braille display, so he needed to type what he heard.
    At the workshop, he said that he would still type it out even if he had a refreshable display because it helped him learn it. I'm dubious that's a good path for most students who are blind that are trying to learn math these days. It would definitely slow them down.

Most work on speech has been trying to have math speak like someone would normally speak it -- i.e, have it spoken semantically. I tried to find a study that compared syntactic speech versus semantic speech but couldn't find any. It would be good to validate whether all the work being done for semantic speech actually does aid understanding. Anecdotally, several students have said they want math to be read the same way their teacher reads it, so the reason for generating semantic speech isn't completely made up.

Apologies for the extremely long comment. There was a lot to respond to and to explain.

@fred-wang
Copy link

Thanks for the detailed reply, @NSoiffer.

Just to add some comment regarding how browser pass info to assistive technologies:

  • One option is to define some default roles/attributes for MathML and have some MATH-AAM spec (similar to other Accessibility API Mappings specs) to describe how browsers should map things to native platform APIs.
  • On iOS/macOS, Apple has already designed a (limited) platform API for math accessibility and this is implemented by VoiceOver/Safari/Firefox.
  • On Windows, Chromium and Firefox currently exposes the MathML markup via ISimpleDOMNode::innerHTML (MSAA interface) and this is used by NVDA and Jaws according to https://bugs.chromium.org/p/chromium/issues/detail?id=426650 ; IIUC Microsoft wants to introduce something similar to UI Automation.

Of course whatever the method used by browsers to pass the information, work is still needed on the author side and on the assistive technology side to get good rendering of the math...

Should we close this issue and mark it as a duplicate of #64 ?

@dginev
Copy link
Contributor

dginev commented Oct 6, 2022

Should we close this issue and mark it as a duplicate of #64 ?

I think I want to second Fred's proposal, given the 3 year silence here.

@NSoiffer
Copy link
Contributor

NSoiffer commented Jan 5, 2023

Solved by using intent.

@NSoiffer NSoiffer closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accessibility Issues related to improving accessibility
Projects
None yet
Development

No branches or pull requests

6 participants