Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include (or not) a sample-set of default conversion from plain-MathML to MathML-with-intent #433

Open
polx opened this issue Feb 2, 2023 · 22 comments
Labels
intent Issues involving the proposed "intent" attr

Comments

@polx
Copy link

polx commented Feb 2, 2023

We should discuss if the charter of the next WG will deliver a sample-set explicitting a default conversion from MathML (without intent) to MathML (with intent) so that legacy MathML expressions can be enriched (at least partially).

In the group's last call:

@dginev indicated that we should not make this a visible deliverable as we shall not have it in any way complete.

@polx suggested that we should promise it as such a promise has no indications of completion and it will be helpful for hinting a first intent-enrichment process.

Let's discuss this on this isse.

@polx polx added the intent Issues involving the proposed "intent" attr label Feb 2, 2023
@dginev
Copy link
Contributor

dginev commented Feb 7, 2023

My position on the call was focused on the claim:

A limited set of examples will be misleading to spec adopters.

Such examples will create an unrealistic (and underspecified) expectation of what may or may not be possible with simple defaulting rules. In reality, very little is inferred reliably with simple rules, and even for K12 materials one needs to consider the full presentation tree, matching on both XML structure and text content of each node.
(+ in harder cases, surrounding expression context)

Since we do not have the resources to develop this kind of mechanism in full, my preference is to defer further work to MathML 5. For the current iteration, I would be more interested in investigating domain-specific rule-sets anchored around the "isa" ( #426 ) capability. It may be possible that simple defaulting examples may be realistic within very concrete "isa" values, such as chemical-formula, arithemtic-expression, system-of-equations, diagonal-matrix, ...


For a specific illustration of my point, consider the intuition:

"Most uses of <msup> have intent power."

If one applies such a rule over a larger text, it is bound to mispronounce a wide variety of different scripted constructs.

Here is an excerpt from my survey of Khan Academy's K12 materials - 15 notations relying on a simple <msup> superscript notation, only one of which is "power". Expand for details.

intentMathML example
power
<msup>
  <mi>x</mi>
  <mn>2</mn>
</msup>
foot
<msup>
  <mn>5</mn>
  <mo>'</mo>
</msup>
inch
<msup>
  <mn>10</mn>
  <mo>''</mo>
</msup>
ordinal-mark
<msup>
  <mn>10</mn>
  <mtext>th</mtext>
</msup>
degrees
<msup>
  <mn>10</mn>
  <mo>°</mo>
</msup>
inverse-function
<msup>
  <mi>sin</mi>
  <mn>-1</mn>
</msup>
embellished-name
<msup>
  <mi>A</mi>
  <mo>'</mo>
</msup>
direction-of-approach
<msup>
  <mn>0</mn>
  <mo>+</mo>
</msup>
conjugate
<msup>
  <mi>A</mi>
  <mo>*</mo>
</msup>
transpose
<msup>
  <mi>A</mi>
  <mo>T</mo>
</msup>
inverse-matrix
<msup>
  <mi>A</mi>
  <mn>-1</mn>
</msup>
absolute-complement
<msup>
  <mi>A</mi>
  <mo>C</mo>
</msup>
first-derivative
<msup>
  <mi>f</mi>
  <mo>′</mo>
</msup>
nth-derivative
<msup>
  <mi>f</mi>
  <mrow>
    <mo>(</mo><mi>n</mi><mo>)</mo>
  </mrow>
</msup>
positive-ion
<msup>
  <mi>Na</mi>
  <mo>+</mo>
</msup>

@NSoiffer
Copy link
Contributor

NSoiffer commented Feb 7, 2023

I think the issue is not whether the charter says we should have a sample set of mappings of defaults to intent, but:

  1. are there defaults so that authors know what AT will do without intent or should authors always use intent if they care about how something is spoken. As a basic example, if no intent is given on mfrac, should authors expect that it will be spoken as a fraction?
  2. if there are defaults, how basic or complicated should they be?

Background

I tried to answer that as part of a position paper a few months back. See that paper for more details. Here I will just list the defaults so they can be discussed individually or as a group via comments.

First off though, note that I strongly feel there needs to be a math level (or higher) attribute that controls defaulting behavior so that legacy documents can be interpreted with the knowledge that the authors were not able to make use of intent (hence, AT can infer whatever it feels is appropriate). I proposed an attribute intent-default with the following values:

  • legacy (default) -- AT is free to apply any heuristics they want
  • structure -- speak the structure (see details below)
  • common -- speak the common interpretation in lower level math (see details below)

Note: in all cases, if intent is given, it should be used (if appropriate). Even for legacy, it is possible remediation may have added an intent value.

Proposed defaults:

AT should have a specified default interpretation for every MathML Element. That doesn't mean that the exact words are specified, only that AT chooses words that convey the default meaning. For example: msup is spoken as "super" or "superscript" if intent-default = "structure" and is spoken as a power ("x squared", "x raised to the n minus 1 power", etc) if intent-default = "common". The exact words may depend upon both the audience and the arguments.

"structure"

The goal of the default is to avoid any inference of semantics. The meanings and special cases for all the MathML elements are (expand to see details):

  • leaf tags speak their contents. Exceptions are:
    • ms likely indicates it is a string or speaks its open/close deliminators in addition to its contents.
    • mglyph speaks the alt text
    • mspace, malginmark, maligngroup, and none are either silent or generate pauses
    • msline, indicates that it is a line
  • mrow -- speaks the children
  • mfrac -- arg1 "over" arg2 (might need bracketing words -- start over/end over?)
  • msqrt -- "radical symbol" contents???
  • mroot -- "radical symbol" with index and contents???
  • merror -- indicates there is an error and speaks the contents
  • mfenced -- should speak the same as the equivalent mrow notation
  • menclose -- should indicate the notation attributes along with the contents.
  • msup -- should speak that it is a superscripts, although maybe there should be exceptions for the pseudo-script characters, in which case the superscript is not spoken (e.g, $x'$ is spoken "x prime")
  • msub -- indicates a subscript
  • msubsup -- indicates a subscripted variable raised a power with the same special cases as msup
  • mover -- indicates that the second argument is over the first, although the words need to clearly distinguish this from mfrac which is proposed to use the word "over". Maybe "base with 'over' above"?
    • Special cases: bar, hat, caret, tilde, dot (1-4 of them). Maybe acute and grave. Probably not overbrace and overparen because those likely need grouping words.
  • munder -- indicates that the second argument is under the first ("base with under below"?)
  • munderover -- indicates there is content above and below the base ("base with under below and over above"?). Uses the special cases of mover.
  • mmultiscripts -- indicates the scripts and their position in some way. E.g., "start-scripted ... pre-subscript ... pre-superscript ... base ... post-subscript ... post-superscript ... end-scripted"
  • mtable/mtr/mlabeledtr/mtd -- say something appropriate for tables (no recognition of determinants, matrices, vectors, etc)
  • elementary math elements (mstack/mlongdiv/msgroup/msrow/mscarries/mscarry) -- say something about the layout, but not that it is addition, long division, repeated decimals, etc.
  • maction -- speaks the selected child with maybe some indication of the action
  • semantics -- speaks the presentation child

"common"

The goal is to use common 'K-14" meanings so the need to use intent is minimized. The default meanings and special cases for all the MathML elements are (expand to see details):

  • leaf tags speak their contents. Exceptions are:

    • ms likely indicates it is a string or speaks its open/close delimitators in addition to its contents.
    • mglyph speaks the alt text
    • mspace, malginmark, maligngroup, and none are either silent or generate pauses
    • msline, indicates that it is a line
  • mrow -- speaks the children

  • mfrac -- indicates it is division, but might have a number of special case rules depending on the arguments

  • msqrt -- indicates it is a square root

  • mroot -- indicates it is a root with an index. There should be special cases for at least '2' and '3' as the index

  • merror -- indicates there is an error and speaks the contents

  • mfenced -- should speak the same as the equivalent mrow notation

  • menclose -- should indicate the notation attributes along with the contents. Special case speech might be appropriate when menclose looks like a similar notation that has special cases (e.g, notation="top" looks the same as mover with a "_" (or equivalent) second child).

  • msup -- should assume that the notation is a power with the following special cases

    • the power is '2' or '3'
    • the power is '-1' and this is a trig function (see below)
    • the power is one of the pseudo-script characters, in which case the superscript is not spoken (e.g, $x'$ is spoken "x prime")
    • the power is an mo (and not one of the pseudo-script characters), use "superscript" or maybe "embellished with" instead of "power"
    • the base is one of the named sets (see below)
  • msub -- indicates a subscript. Special cases:

    • the base is "log"
    • the base is one of the named sets (see here)
    • the base is a large operator
    • others???
  • msubsup -- indicates a subscripted variable raised a power with the same special cases as msup and msubsup. This includes (read the same as for munderover)

    • the base is a large operator
  • mover -- indicates that the second argument is over the first. Special cases:

    • bar, hat, caret, tilde, dot (1-4 of them). Maybe acute and grave. Probably not overbrace and overparen because those likely need grouping words.
    • the base is a large operator
  • munder -- indicates that the second argument is under the first. Special cases:

    • the base is a large operator
    • the base is "lim" or "limit" (FIX: does this need to be language agnostic?)
  • munderover -- indicates there is content above and below the base. Special cases:

    • those listed for mover
    • the base is a large operator (speak using "from" and "to" -- see here)
  • mmultiscripts -- indicates the scripts. Special cases???

  • mtable/mtr/mlabeledtr/mtd -- say something appropriate for tables. Special cases:

    • row and column tables might have specialized speech
    • small tables with simple entries might have specialized speech
  • elementary math elements (mstack/mlongdiv/msgroup/msrow/mscarries/mscarry) -- say something appropriate

  • maction -- speaks the selected child with maybe some indication of the action

  • semantics -- speaks the presentation child

Summary

As stated at the start, we need to answer the questions of whether there should be defaults (given my long post, it should be clear what my position is). If other agree, then we need to come to an agreement on what the defaults should be. The above list is a first cut.

There is a trade-off that should be considered. If the rules/special cases become too numerous, then AT is less likely to implement them. On the other hand, if special cases aren't listed, then authors/authoring software needs to go to extra work to generate them which makes them less likely to use intent because it becomes burdensome to do so. I think the above list is implementable by AT because it is not that much larger than what more simplistic AT does now. I also think it probably captures a large majority of what authors want said by default, although I may have missed a few special cases.

If you feel a default/special case is missing or if some default is wrong/has too many special cases, that's what comments are for...

@NSoiffer
Copy link
Contributor

NSoiffer commented Feb 7, 2023

Responding to @dginev's comment...

While I completely agree that simple rules will fail to capture a significant number of special cases, I disagree that they are not useful. I would love to gather some data, but my guess based on looking at a lot of math textbooks and tutorials over the years is that they will capture over 95% of the cases, maybe even over 99%. These numbers don't reflect "good" speech, just "not wrong" speech. I know that is pretty bold, but simply put, there are a lot of mfrac, msqrt, and mroots out there and they are almost always what their names imply in K12 math. Furthermore, when munder, mover, and munderover have a large op as the base, the rules are going to be correct 99+% of the time for K12, and the rules for them when they aren't large ops will never be wrong since they just describe the structure (which is what has to happen without defaults anyway).

Moving on to mrow, since it just speaks the children, it is structural. It may not be optimal (e.g., it will miss absolute value and lots of other notation), but it won't be wrong, just ugly -- no less ugly than if the rule wasn't there. So again, 100% correct.

The least accurate rule is the one for msup. There, it is likely there will be cases where if it is not a special case, "power" won't be the correct reading. However, if you flip through most K12 textbooks, power is what is very commonly meant except for the pseudo scripts I listed. So maybe msup is only right 90% (still just a guess). The one place where it would fail more often is a Chemistry book where scripts mean something else. Given the specialized notation (including non-italic element names), I would hope macros get used to produce that do add intent and isa so that the defaults don't get used.

Based on your examples, I updated the msup rule to add a special case to use "superscript"/"embellished with" for mo when not a pseudo-script. That still leaves cases where it will get it wrong (' is foot or minute, and not prime; $(n)$ is nth derivative). If fact, half of the cases you listed would be spoken wrong, but I think those examples probably come up less than 1% of the time.

I'll be the first to admit that I don't have statistics to back up my claims. It would be great to go through a dozen textbooks and do counts, but I don't think that anyone has the time/stamina to do that. At best, we could find the number of msubs and ones of those that match a set of special cases that are potentially spoken wrong (they would have to be examined to determine that). But even without looking at those cases, it would give rough lower bound on how many are spoken wrong (especially if we included all the cases you found earlier, not just the ones listed above which I think you edited for brevity).

@brucemiller
Copy link
Contributor

My first comment regards the "default default", ie. what defaulting rule set (if any) applies when there is no intent-default (although possibly explicit intent): the "legacy" case. I think the specification should not require superscripts to be treated as powers. I'd probably prefer the structure rule set as a default-default, but not common. However, I could live with the behavior in your current description: the AT is free to do as it wants.

@brucemiller
Copy link
Contributor

My second comment is that I believe we should express the defaulting rule sets in terms of intent, rather than free text. For example, in the common set, msup has intent="power".
This will have several benefits:

  • more concise rule sets, clearer and avoids repetition
  • guides (part of) what needs to be in the core dictionary
  • tests/verifies the expressiveness of intent and demonstrates its use.
  • suggests (but doesn't require) an implementation strategy: (1) apply default rules, (2) speak the result

And the fact that it would force us to put some non-semantic items (eg superscript) in the core dictionary can be seen as a benefit. It would allow authors to override bad assumptions made by the defaults (eg common) on specific sub-expressions.

@polx
Copy link
Author

polx commented Feb 9, 2023

are there defaults so that authors know what AT will do without intent or should authors always use intent if they care about how something is spoken.

I think we agreed that this is likely a too big entreprise: Any default set we can recommend will be frustrating. Leave this in the field of brave experimenting implementors.

Deyan: We should remove: A limited set of examples will be misleading to spec adopters.

I agree. Let us not promise such a set of rules.

I strongly suggest to start with a vocabulary clarification first:

  • enrichment suggestions: (default default or anything such): a set of rules that may be recommended to inject intents into an expression (these are symbolic names of intents)
  • core intents: a small set of intents' names which we consider should be useful, recommend its pronunciation, and possibly recommend its translations for languages we know and for more when we get contributions for new languages.

The easy bits in there are the function-names at the bottom of David F's list (map word to intents then to pronunciation) and the unicode characters (map character codes to intents and pronunciation possibly differing from Unicode). Both imply translation as well.

@dginev
Copy link
Contributor

dginev commented May 18, 2023

@NSoiffer the original reason to open this issue, the way I remember it, was more specific than the general topic of "default rule sets for intent", which may fit better in a new dedicated issue (especially if we are closer to consensus to add some recommended markup).

To summarize my comments from the discussion in today's meeting:

  • Having 3 initial named directives: "legacy", "common" and "structure", as you suggested is a good starting point.
  • The actual default behavior should continue to be "legacy", so that all existing MathML on the web is processed with the same assumptions as it is being processed today.
  • As a consequence, the "common" and "structure" directives should be viewed as progressive enhancement, and be active only when explicitly requested in the markup.
  • It would be more economical if we had a vehicle to mark defaults globally, such as the <meta> element, but in absence of that, the <math> element is the next reasonable carrier.
  • It would also be more economical if we repurpose intent properties here, and provide 3 Core property values for the directives.

So a first suggestion would be:

<math intent=":default-common">...</math>
<math intent=":default-structure">...</math>
<math intent=":default-legacy">...</math> <!-- identical to <math>...</math> -->

Separately, @davidfarmer expressed his hope that we won't have a prolonged discussion on what exact behaviors go into the "common" defaults. I certainly understand the sentiment. However, unless we specify a clear and fixed set of rules, it is reasonable to expect that different AT systems will implement different behaviors. Maybe that is an acceptable outcome, but we should be aware that we are making that choice.

Moreover, it is good if we can now bundle this discussion together with discussing properties, as I can make this "AT alignment" point in general:

Unless we clearly enumerate the exact effects each "behavioral property" is expected to enrich, we will have differences in behavior/coverage between AT systems. As an example, :chemical-formula may lead to AT system 1 to understand <mi>C</mi> as Carbon, while AT system 2 will (wrongly) understand it as Celsius, while AT system 3 will treat it as self-voicing/unrecognized (as it wouldn't be "common enough" for it).

To bring this example back to defaults, if common is left unspecified then AT system 1 may treat <mo>→</mo> as maps-to (mathematics), AT system 2 may treat it as yields (chemistry), while AT system 3 may treat it as unknown and self-voicing, as it wouldn't consider it "common enough".

Neil's "proposed defaults" are a start, but they are incomplete from a Western K12/K14 education standpoint. Should we try to make them complete?

@davidcarlisle
Copy link
Collaborator

I'd be tempted to drop legacy given that there has been no cross browser mathml prior to this year. I don't see a large corpus of documents with a usably definable legacy behaviour. Existing content is almost always in closed systems that can work as before, or guarded by javascript such as mathjax.

Commiting forever to an "undefined default behaviour" and forcing opt-in to get an defined behaviour seems a high price to pay to get unchanged behaviour for a possibly non existing set of documents.
If you are using chrome there are no old documents with an existing mathml behaviour.

@dginev
Copy link
Contributor

dginev commented May 26, 2023

Adding a couple of my comments from the meeting on May 25th:

  • The default rule sets should list out which of the available attributes are to be used when building accessible narration (e.g. linethickness from mfrac or notation from menclose), with the assumption that unmentioned attributes can be ignored. Motivated by examining the :default-structure behavior during the call.
  • Whether the :default-common rule set is recommendable as the main behavior on raw MathML 4 will depend on how good its rules are. If it is too narrow in coverage, or too strong in assumptions, the :default-structure set could be a safer main behavior from a user experience standpoint.

@davidcarlisle
Copy link
Collaborator

The names of properties seem to have changed and stabilised a bit since this discussion was last active. In the current core properties list we just have :literal that is relevant. I don't think we need to have explicit :common or :legacy as in any case I think infering intent (or speech hints generally) from mathml without intent should be implementation defined. Although we can give strong hints by way of examples.

I don't think the list of <msup readings in @dginev 's comment above are a problem, they are more or less the standard justification for introducing intent. If the assumed default reading of is power apart from some special cases such as prime then that will be wrong sometimes but intent can be added. It would be possible to imagine a setting which read msup as power without the special casing of prime so somewhere between the default and the ":literal" reading but It would be tricky ro specify and I don't really see the use case for it.

so of the three cases in @NSoiffer comment #433 (comment) structure is now literal I would make common the default (so could be nameless unless we want to always have an explicit markup equivalent of the default) I don't think legacy is needed, that is always available as an option as a processor may simply ignore the intent either because it's not implemented or at user option. That seems better as a reader choice than giving an option to the author to use intent markup to specify that unspecified heuristics should be used.

@NSoiffer
Copy link
Contributor

In the Jan 9 meeting, we started to discuss the "common" idea. One suggestion from that meeting by @MurrayIII and @polx is to have a JS library that inserts the intents based on a set of rules rather than AT doing that. After the meeting, I realized that while that may be viable for a web page, it isn't a solution for Word or PowerPoint documents or other non-web documents. Because of this, I don't think JS is a viable general purpose solution.

Although it is in the minutes, to make it more obvious in this issue, @dginev proposed that rather than formalizing the common rules now, we allow "vendor extensions" à la the web, so maybe a value such as :mathcat-common.

@davidcarlisle
Copy link
Collaborator

I don't think we should have a vendor specific version as that would suggest other systems don't support it which would hamper rather than aid portability. The situation is rather different in css with existing rules for systems to ignore unknown properties, and even there the vendor extensions are introdcued by the vendors not something promoted by the working groups.

As there is already a lot of flexibility on the exact words systems use, they don't have to follow the exact speech hints for core concepts or properties, I don't think having a common list of rules unduly constrains system innovation.

As I note above, I'm not convinced by the need to have a "legacy" default as while there are existing javascript based solutions there simply isn't a large corpus of cross browser documents using native mathml with AT readings for which some legacy compatibility is needed. As far as it exists at all the "legacy" behaviour is likely to be closer to common than anything else (eg MathCat's current behaviour isn't so far from that) So having that as default with structure (now called literal) as a specified option to opt out of that seems preferable.

@dginev
Copy link
Contributor

dginev commented Jan 13, 2025

The reason to reach for a vendored prefix is exactly so that the working group does not recommend an unfinished / potentially broken behavior for :common. I remain quite skeptical that the general design goal for such a property is achievable without major damage in producing the accessible outcomes (due to the intent heuristics applying at unintended notations, or simply guessing wrong, let alone different AT engines implementing partially incompatible rulesets).

Making an unfinished/unproven design portable should not be a goal of the group.

Starting with a MathML 4 in which :mathcat-common can compete with :unicodemath-common, :sre-common, etc. would be an interesting experiment. Note that just as with the web use of -webkit- and similar vendor prefixes, one can specify multiple vendored properties at the same time, so that a healthy experimental use is possible via:

<math intent=":mathcat-common:unicodemath-common:sre-common">

I quite like that syntax, as it clearly marks the experimental state of the feature, follows a proven web platform precedent, and also allows for multiple vendors to be tested on the same document.

My proposal would be that only after one of these experiments has some sucesses under its belt, then we can move to standardize a clearly specified :common, ideally during a freshly started group charter.

@davidcarlisle
Copy link
Collaborator

Starting with a MathML 4 in which :mathcat-common can compete with :unicodemath-common, :sre-common, etc. would be an interesting experiment.

Explictly making a document target at one system (or worse a list of systems) <math intent=":mathcat-common:unicodemath-common:sre-common"> should be a last resort that we should avoid.

If the proposal stays with the default as an unspecified legacy then the end result will be that most documents will get read with some approximation to the common version but in a completely unspecified way. I can't see how that is a good outcome.

Having some rules (any rules) can not be harmful in the way you suggest as once they are specified authors can use intent to over-ride them where needed. If the default is simply "unspecified behaviour" then that is really a failure of the group to specify something more usable and gives authors no guidance at all.

@dginev
Copy link
Contributor

dginev commented Jan 13, 2025

Having some rules (any rules) can not be harmful in the way you suggest as once they are specified authors can use intent to over-ride them where needed.

"Broken by default" is not a design philosophy I subscribe to, and I am somewhat surprised you appear to be advocating for that.

What legacy systems did is not affected by any decision made in this issue, and that part of the conversation appears to be a distraction to me.

@davidcarlisle
Copy link
Collaborator

"Broken by default" is not a design philosophy I subscribe to, and I am somewhat surprised you appear to be advocating for that.

That's not what I'm advocating.

What legacy systems did is not affected by any decision made in this issue, and that part of the conversation appears to be a distraction to me.

The issue is all about what the default should be. The proposal to make that "legacy" aka completely undefined is what I don't like. I think the specification should specify a usable default. legacy systems that do not change will do what they do in any case but they do not claim conformance to MathML 4 and I don't see the need to water down the rules so that they can be retrospectively considered conforming.

@dginev
Copy link
Contributor

dginev commented Jan 13, 2025

And yet the only reliably usable default is :literal. A "legacy" default allows AT systems to attempt heuristics at their discretion, while a "common" default is broken by default, in my view.

@davidcarlisle
Copy link
Collaborator

If you need to specify :literal then it's not a default. The question is what to do if there is no intent. The proposal here is to let the system do whatever it wants. I see very little benefit in that over specfying something close to what current systems currently do with the intention of making systems use the same rules going forward.

I don't agree with your definition of "usable", it is obviously less ambiguous, but that is solving the wrong problem. The problem is how to produce the best readings from exising intent-free corpus of mathml. For a large class of documents and large class of readers, the readings inferred by existing systems such as mathcat or mathjax/sre are preferable to a :literal reading, so :literal really has to be opt-in not the default. So the choices are leave the default completely unspecified (which I think would be a missed opportunity) or specifying it, which would be something like :common.

A "legacy" default allows AT systems to attempt heuristics at their discretion, while a "common" default is broken by default, in my view.

I honestly can not understand that comment at all. The readings may be wrong in some cases, but if it is just unspecifed heuristics the author has no way of knowing what will happen. If there is a specification such that conforming systems use the same heuristics, the readings are in not more broken, but the author has the benefit of knowing in advance whether they need to add extra markup to guide the speech.

@dginev
Copy link
Contributor

dginev commented Jan 13, 2025

If you need to specify :literal then it's not a default.

What? I am suggesting that specifying that the default readout of <math> (without intent) would be as if <math intent=":literal"> was used is the only reliable ruleset we have at the moment. It won't create false readouts, even if some may be overly verbose.

I don't agree with your definition of "usable", it is obviously less ambiguous, but that is solving the wrong problem.

Your take on "usable" will make arXiv readouts broken by default, so I am happy to disagree with it.

For a large class of documents and large class of readers, the readings inferred by existing systems such as mathcat or mathjax/sre are preferable to a :literal reading

First, you are asserting that without any validation. You may be right, you may be wrong or worse - it may depend on the document, on the AT system and on the reader.

The readings may be wrong in some cases, but...

That level of design should be unspecified/vendor-specific behavior. We shouldn't standardize "sometimes wrong, but..." as a global MathML default.

I see creating a dedicated vocabulary (such as :mathcat-common) and setting it on <math intent=":mathcat-common"> then overriding any pieces which are not covered (or are contradicted) as healthy. Specifying that should be the default behavior for every <math> element is neither necessary, nor helpful.

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Jan 13, 2025

What? I am suggesting that specifying that the default readout of (without intent) would be as if was used is the only reliable ruleset we have at the moment. It won't create false readouts, even if some may be overly verbose.

Oh sorry I misunderstood, OK, but I can not see that being a workable proposal. :literal has to be opt in, you can not degrade the default reading in that way. A lot of generated mathml (including all current mathml) has no intent, and systems can (and will) generate reasonable readings for that can not be expected to drop down to a literal reading without author or reader opt-in.

Your take on "usable" will make arXiv readouts broken by default, so I am happy to disagree with it.

I do not agree it is broken but your proposal would not change that, just leave the state as undefined, which is not an improvement.

I do not think reading conventional notation using conventional terminoligy is wrong even if it's mathematically inaccurate.
I may (and have) used P^n to denote something that is not a power but that notation was chosen (not by me) by analogy with powers and in the knowledge that any reasonably mathematicaly literate reader who picks up a paper and sees the expression out of context will see $P^n$ as an n-th power. Notation is given a precise meaning but not chosen arbitarily, it is chosen to reflect some association to a reader.

@davidcarlisle
Copy link
Collaborator

and setting it on <math intent=":mathcat-common">

This issue is about what to do when there is no intent.

The proposal above from Neil is to leave that as ":legacy" (ie unspecified)

I don't think the group has seriously considered making :literal the default (and would be surprised if there is agreement on that but you can always ask.)

So it seems to me the only real option other than :legacy is something close to what current systems do now, but written down so that systems can converge on a common default, and that is the :common proposal.

@dginev
Copy link
Contributor

dginev commented Jan 13, 2025

To me the workable options are :legacy or :literal, as :common as presented so far to the group would break too much in higher mathematics texts. But I also think we can remain silent on the question entirely, as MathML 1, 2 and 3 have been in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intent Issues involving the proposed "intent" attr
Projects
None yet
Development

No branches or pull requests

5 participants