-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do the # of arguments define a core "intent"? #435
Comments
My suggestion would be to stay consistent with the Open realm and to not mix the exact argument lists with the intent value considerations. I like thinking of each Open intent value as an "encyclopedic concept", i.e. a well-known name in the communities familiar with the subject matter. Using an intent with a concrete number of (and order of) arguments, (with a concrete operator fixity), falls closer to describing a specific notation. To me, the same intent value can be used in a variety of notations, at least as far as the Open terrain is concerned. And which notation was used in any particular case can be inferred from examining the presentation tree. Ideally the Core and Open sides will try to be mostly consistent, but this remains to be worked out. If you'd like this to be a core-only issue, maybe we should indicate this in the title? Currently it's stated quite generally. P.S. As an implementation note for MathCAT, I think there is opportunity for "graceful degradation" where any unexpected notation can degrade to the baseline readouts expected for an Open intent. In the ray |
I do not think this should be an error, otherwise making any later changes to the core list will be difficult (or impossible, depending on how we handle errors) I think it is OK for the list to say which arities it handles in each case, and just treat others as you would any other identifier. I think our motivating examples have always been somewhat fluid on arity,
and
should both be legal, and the second hasn't obviously got arity 2 Deyan suggested the presentation tree could be used to disambiguate, but I don't think that's an option. The intent is there to over-ride the default reading, if there are no There is a more general question about error handling, Deyan alluded to this as well. It's useful to have a mode which flags an error on invalid intents (this caught lots of errors in the initial, hand written intent examples that had not previously had machine validation) but, like xml parsing, aggressive error reporting is not so useful on a web page where the person getting the error message has no way to correct the source. I should open another issue but I don 't think
should have a reading equivalent to
or
not (by default) reject the entire expression I added this example to https://mathml-refresh.github.io/intent-lists/intent3.html#IDUnexpectedarity |
We can certainly answer what a baseline read of That should not lead us to forbid implementers from using all available context (such as the presentation tree) when trying to build superior systems. Consider <mfrac intent="divide(1, 2)">
<mtext>wibble</mtext>
<mtext>squibble</mtext>
</mfrac> <mrow intent="divide(1, 2)">
<mtext>wibble</mtext>
<mo>÷</mo>
<mtext>squibble</mtext>
</mrow> If AT wanted to emit the speech "one half" for both, it certainly could. But if a system considered it more appropriate to emit "1 divided by 2" for the horizontal notation (by checking the presentation tree to see there was an infix It is a big design motivation for intent to keep it a "partial annotation" scheme. The flip side of that coin is that AT systems should be allowed (but not required) to collect any missing information from the other available facets of MathML. I have tried to make a distinction between the two by calling the simpler approach a "baseline readout", but I am still fishing for the right words for the "perfect readout" considerations. I agree with David C's comments on error handling. We likely need some text on suggested recovery strategies. So far I am fond of
So for: <mfrac><mi intent="hmm)(">x</mi><mi>y</mi></mfrac> I'd revert to: <mfrac><mi>x</mi><mi>y</mi></mfrac> We could do better if our discussions on introducing a self-reference marker move forward. If we had a
for any node. |
@dginev wrote
I think implementations should be free to implement a larger list of "known intents" than the core list, so build superior systems in that way. But I don't think the presentation tree can be used here. Authors should be able to specify an intent that differs from the intent guessed from the layout without the system deciding to be superior and re-using the layout.
yes agreed, something like that. (if we don't drop it entirely) |
I agree with that as well.
Right, but that mixes two aspects. How the pieces of an expression fit together (i.e. the notation) is a separate aspect than which concept we are narrating (i.e. the intent value). To me the use of a certain Starting with the annotated concept and then using the presentation tree to infer how the pieces of its notation can be spoken together does not void the original annotation. Rather, it helps make the chosen speech for it more natural (as the annotation was partial, by design). The group has chosen to invest in |
Yes. I think if for whatever reason I want
|
although If I understand you correctly, I suspect I agree with this requirement. examples from another thread, I really want to annotate mtable as "matrix" or "system of equations" or... Currently if you have
it gets read as "matrix" and the array content is lost. I think you are suggesting that would only happen for
|
That could lead to trouble. Flashback to some pivotal point in mathematics history where a brilliant PhD is being written and it used That leads me to an idea that @dginev mentioned, but didn't suggest as a formalization: "require" (via convention) open intent names to begin with |
I'm not keen on distinguishing the core list, I don't think it can be stable, just like the operator dictionary. Like the operator dictionary you can go We need to make up an initial core list to get things started, but we can't be complete. |
Some replies:
|
My proposal is one of conventions (as is true in HTML). You can generate Note to @dginev: if we leave off hints, then there is a big difference between being supported or not supported for common cases: So there is certainly no harm in doing what you are doing now using a core-like name syntax (under my proposal). You might need to update your names as we come up with specific core names, but that's true whether we go with my suggestion or not. |
On Wed, 22 Feb 2023 at 00:56, NSoiffer ***@***.***> wrote:
Some replies:
1. This is probably "core-only" as it involves a conflict with an
existing definition. I've updated the title.
2. Re "graceful degradation": I like seeing errors upfront rather than
trying to spot them in the output. But I think I'm a minority, so I agree
that the right thing is probably just to ignore an intent anytime there is
an error: bad syntax, wrong number of args, reference to a non existing
arg, etc(?). I could maybe add the words "intent error" to help
authors, but that's likely less desirable for users.
Yes I prefer xml's fatal error handling when processing my own doc, and I
appreciated mathcat catching multiple errors in my hand crafted intents, so
I wouldn't want to lose that possibility, but once it gets in to an html
page with an html parser that accepts literally any string and parses it
somehow, it's going to seem odd to have errors flagged reading intent.
David
Message ID: ***@***.***>
… |
On Wed, 22 Feb 2023 at 01:09, NSoiffer ***@***.***> wrote:
I'm not keen on distinguishing the core list, I don't think it can be
stable, just like the operator dictionary.
My proposal is one of conventions (as is true in HTML). You can generate
***@***.***($arg)' and no error will happen. It may or may not
speak as you intended because there may be a definition in core and AT does
it thing. If you generate ***@***.***($arg)' , then you can be
assured it will speak as "arg foo".
yes OK I think we more or less say that already, that `_` names won't be in
core (or the system, list) and so will be spoken as-is.
Message ID: ***@***.***>
… |
Good point -- I had forgotten it says that. But the spec doesn't say the opposite: that "concept names" should be only for core concepts that are defined (elsewhere?). Maybe that is implied, but it would be good to be explicit. I am a little concerned about something like:
which I think has been proposed a number of times. It doesn't end up doing what you might think... I think @dginev had a clever algorithm to try and reconstruct the concept/meaning for this kind of string-like notation where it avoids the connective words and comes up with A simple solution is to modify what's used by DG's algorithm. Maybe |
@NSoiffer I don't quite see how requiring a
|
I don't think it should do that: the core list should inlcude arity and use of the name with unexpected arity should be like any other non-core use, not an error. |
@NSoiffer If you are attempting to completely axe the Open list, why not be explicit about your intention? That is the ultimate effect of such a proposal, where the underscore-based speech overrides (akin to Even if you personally are not interested in implementing support for the Open terrain, I again remind that it is exactly that terrain that motivates my participation in the group. Why take away my opportunity to do meaningful work with higher mathematics? What distinguishes Core from Open is the inclusion of a vetted list of values that will be endorsed by the MathML 4 specification, which will define Core. Open is any concept that is not included in that list. Is that not sufficient? |
I'm trying to make using open names safe so they don't collide now or in the future with core names. By "requiring" (by convention) that they start with
What I was referring to is this part of the spec:
I take "Intent Concept Dictionary" to mean the core intent names. Maybe that distinction needs to be explicit because I do think it makes sense to have a unofficial open dictionary that might change everyday. But that dictionary doesn't belong in the spec other than maybe as informative text that points to a W3C note that explains the goals of an open list. That might then point to some suggested location to gather the list. By "opposite" I meant names that don't start with "_" should only be names that are in a core list for the reasons listed above. To be absolutely clear: the names chosen for open intents are the ones that will be spoken unless To emphasize my support for the open list... I want the core list to be small to ease implementation burden. That means I strongly support an open list that avoids using literal text as much as possible. That's why I feel that hints are essential. Without them, either the open list intents read poorly in many cases or people have to write out what they want spoken using literal text. The hints don't solve reading all notations appropriately (e.g., "...to...from...of"), but they go a long way in that direction. |
I'm not sure that is possible or desirable. If I use
in html, data- attributes are supposed to be document local with no external def (so like
I think the intention of the current wording is that
Core is too restrictive, there is no way to get a reasonable list at first iteration, we should aim for a conservative core list which is a baseline "systems should support at least these names".
I don't think that should be the case. I think names not in the system's concept list should be spoken as-is, and that list should include all of core. For example mathCAT has rules handling
agreed with that. |
Great to have this as a consensus point, thanks. I am aligned with @davidcarlisle's reply on why we still need Open and Core values to look the same.
To me hints are just one approach to provide fixity information. Some seem to like them, others want to abstain from using them, in favor of other techniques. |
As @dginev will no doubt have guessed I agree with his comments apart from
Using the presentation tree is viable by pushing the intents down as you show with infix |
Right. So we can restrict what I stated with "on a case-by-case basis" - that seems sensible. Namely:
|
I don't think it should ever consult the layout,. A system can read core |
So when you said "it can of course read it any way it specifies", you were exempting the specification of "narrate the intent expression following the fixity observed in the presentation tree of the node holding the intent attribute"? If I were writing an AT (or were forking an existing open-source AT system to enhance it for arXiv), I would be partial to that approach, on a case-by-case basis, for Open values I have vetted. Edit: improved technical wording |
given that ultimately we want the system to be free to use any words in any language, we obviously can't have any enforcable rules that prevent that. But I would say that we should make it clear that systems "SHOULD NOT" (https://www.rfc-editor.org/rfc/rfc8174) do that. If the system can read Actually that may be too strong, a motivation for adding |
I don't think the strict "SHOULD NOT" is consistent with the goals of a "partial annotation". I would prefer a "SHOULD" that specifies a |
I think it is reasonable to have a design aim that an author has the expectation that elements with the same intent are read the same way. You have suggested before that we change that, which is not impossible but I haven't seen much consensus for such a change. (Actually the current spec is sufficiently sketchy that anything is possible, but that's just the state of the text). But sure, saying a system SHOULD use the intent alone is almost the same as saying it SHOULD NOT not do that. Either would work, when we come to fine editing of the spec wording we need to worry about any edge cases where they differ. We are nowhere near that yet, just setting out the basic models how this is intended to work. |
A specific instance I am thinking about looks like
a × b
This could mean multiplication, or cross product, or direct product, or
...
My first question: should the intent go on the × , or should it go
on the mrow, or is either acceptable?
I like the idea of putting it on the × , and more generally I
like the idea of putting the intent as deep in the tree as possible.
But suppose for one meaning of × , the standard pronunciation is
something like "the direct product of a and b ".
My second question: when words are supposed to be spoken before
the " a ", is it expected that the AT looks past the " a " to see
the intent on the × so that the whole expression is pronounced
correctly?
It is better for me if I know that it is okay to put the
intent on the × in all cases.
|
either acceptable but not equivalent.
for a concept name in core (so definitely https://mathml-refresh.github.io/intent-lists/intent2.html#id5 which have both forms |
@davidfarmer better example extracted from your list https://mathml-refresh.github.io/intent-lists/intent1.html#id5 showing mathcat's reading of cross product with intent on the mrow or on the mo |
Given
we might reasonably expect AT to simply read off the mrow children as "a foo b"; it never really needs to consider whether foo is infix or not. Given
we might also expect "a foo b" because AT was told to treat foo as infix. Finally, given
we probably should expect "foo of a and b" (or similar), unless it knows something about "foo" from outside sources. I think it is slightly absurd to expect AT to guess that the If there's anyone I failed to insult, I apologize. |
If you think inferring (raw screenshot, I typed this in as an experiment for this reply just now) Note that |
@dginev: I'm a little surprised that with the vast arXiv archive that has so many "interesting" notations, you are advocating that AT can figure out that an author wants a reading different from what is implied by the intent. Maybe bra-ket notation is an example. The Given that authors have a simple method for specifying they want an infix reading, I agree with @davidcarlisle that a "SHOULD NOT" is appropriate. |
@davidcarlisle wrote:
You could also show it with a hint to get the infix reading:
I know you have shied away from using hints, but given the current discussion, I think showing hints would be useful. They work in MathCAT. I know you |
@NSoiffer I am advocating for the spec allowing to use the available information in the markup (the presentation tree is already there). If any individual AT vendor does not want to do so, that is fine. The spec saying that all vendors "SHOULD NOT" use the available information goes against the "partial annotation" aspect of Intent. P.S. ChatGPT isn't as robust as symbolic approaches, clearly. I am not saying it can solve everything over arXiv today. It also wasn't the point, as matching the That said: |
I did not say it was absurd to infer, I said it was absurd to expect AT to infer. |
Yes I'm gradually starting to add hint examples in list3 where I have a dedicated column for this, I'll try to add more |
It's not that it's absurd or impossible, it just would be wrong to over-ride the supplied intent in that way. If I have
then it really does not matter that a system could infer this is an infix operation between 3 and 4, it should read it as rabbit of 1 and 2 as the intent is there to provide a reading apparently unrelated to the content. Suggesting that the intent should use the layout seems completely at odds with the design of the system and I can't see how it could be added in any reasonable way. What is of course possible, and close to your chatgpt examples (and current mathcat behaviour), is to use the layout to infer default intents, so default readings, when none is given.
The system is of course now free to infer this is an infx operation and effectively default to
|
@davidcarlisle If we assume you are right about your examples, and I am right about my examples, the next question is whether intent is a spec that is more appropriate for annotating If your point hinges on the intent value being completely unrelated to the underlying presentation, I don't think we are working on the same problem domain. My premise is a "partial annotation" for disambiguating a (mathematically meaningful) presentation MathML tree. Not an escape hatch for hiding unrelated concepts over those trees. I liked the middle ground we reached a few comments earlier. If the spec text allows AT to have freedom to do case-by-case enhancements for the Open values, beyond the "baseline narration", that will be sufficient. Each system can exercise discretion as to when it wants to do so (e.g. some may decide to check if |
Unless you write some AT, I don't think any AT will do the kind of checking you want... nor do I think it should for any beast not in core (rabbit, dog, or strict-equality) because (sorry for repeating what has been said many times) the author might not want that. I know you don't like hints, but the simple solution (again, apologies for repeating what has been said many times) is to write it as I have a feeling that David C and I have something in mind and you have something else in mind so that the solutions we are each proposing are based on these different idea. Hence, we are each missing the other's point. Hopefully that will get cleared up at the meeting when there is much higher bandwidth. |
No, that is not the point at all.
we agreed that the system "should not" over-ride the intent, or as you preferred to say it "should" use the intent. Either way it means that while we can't have a testable requirement, the intention of the spec is that it works as in my example.
Exactly that yes. Partial annotation means that you only need to annotate parts of the tree. Not that the annotations may be ignored in the parts that are annotated. |
To me using the fixity information in the presentation tree "supplements" the intent annotation, rather than "overriding" or "ignoring" it, especially for Open values. But of course that is impossible to reconcile with a world view where every |
Done, see column 6 (dark green) at https://mathml-refresh.github.io/intent-lists/intent3.html#IDvectimeshints column 2 shows it with no intent, column 4 with intent but no hint, column 6 shows mathcat readings with three different hints applied |
section 5.1.2 currently says
I think we could expand something like.
Apart from the middle paragraph saying the existing |
Returning to the question asked in the issue description
I think a name known to the system (which should include all core names) should be handled for all arities. So mathcat may know Which basically is a suggestion that the draft spec is modified as suggested in the previous comment. |
I think the basic question has been resolved (as @davidcarlisle suggests) that a core concept is treated specially only when it has the expected number of arguments; otherwise it's just an unknown concept. Thus this issue is closable. |
Agree. |
The guts of this question is: what is the type signature of an
intent
. Is there oneintent
namedminus
or are there two : one with a single argument and one that has two (or more) arguments?This came up when @davidcarlisle added speech output to a version of the initial core list initially created by @dginev. In that initial version, there is
However, MathCAT defined a$\vec{AB}$ . Because the example had no arguments, MathCAT generated an error message (akin to an
ray
intent as a two argument format based on the two points that define the ray:mfrac
having no children).Is a core
intent
name defined by just the name? the name + #args? name + something more (isa?)? Is a no-argument form special?If the number of arguments matter, how do define an nary intent name? Just in prose?
The text was updated successfully, but these errors were encountered: