-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intent for large operators #482
Comments
I'm generally skeptical that custom names offer a significant advantage (e.g. reduced list length) over consistently following a uniform naming convention. Uniform naming keeps the learning curve as small as possible, and aids adoption. To an adopter, a new In the absence of some consistent rule for choosing argument order, we'd need to document each case separately, which is why I had previously raised #478 . |
I'm not advocating adding a "large-operator" concept name -- I'm merely advocating for an organizational arrangement of the names that groups the large operators together to avoid a lot of repetition. My goal is to reduce the size and apparent complexity of the spec. I do think we need to add a few more "fixity" options, but that's not what this thread is about. This thread is about where the |
I'll make a fork with an experimental rendering of the condensed list |
At the last WG meeting I took an action item to look again at this. Unicode 16 (and MathClass-15) have 66 characters classified as largeop ( The full list is at the end of this post. This shows several categories of concept/common character that could potentially be compressed
We could have one of entries in the current style for each of these groups then in each case list the concept names and default characters for the other entries in the group. But split this way there are not really many in each group and I wonder if the indirection really helps or whether it would just be simpler to list each of them separatelyi n the main list, as the list would not be that long as probably several of these characters do not correspond to any concept that we would have in the core list. mathclass L 66
n-ary 17
|
The size of the lists is actually smaller than those lists, because they would be split across core and open lists. So that might argue against have a special category that consolidates the lists. However, as mentioned in the initial comment, each operator has three variants: unadorned, adorned with just a subscript/underscript, adorned with two scripts. All of those need to be listed. So that makes the lists 3 times larger than the number of characters, or maybe 5 times larger if we break out On top of that, we still (I think) need to decide whether the core concept for the intent goes on the adorned large operator or on mrow for the entire concept, or both. If both, that's two times more listings on top of the other multiplies. That's a lot of spec space for essentially identical prose. Based on a philosophy that we've agreed on over the years but I don't think written down, the intent should be as low as possible in the MathML tree. So I'm in favor of it only being shown on a potentially adorned script. Because of this multiplicative effect, I'm in favor a condensed list. If we agree on the 11 large operators mentioned in the first comment as what goes into core, that's potentially one extended listing versus 33 or maybe 55 individual listings. That's a lot of space savings. For the open list (assuming we add all or most of the large operator to that list), the space savings is huge. |
OK I'll experiment on my fork, see what it looks like... |
Practical suggestion: for very related cases, such as some integral signs, develop one concept fully and for the others just add 1 row with:
Is this an organizational question for the HTML concept list pages? There are standard approaches to manage length, for example pagination (e.g. max 100 concepts per page) and sub-pages (e.g. the different aritiy rows could be subpages linked from the outer page that has 1 row per concept). If the Open list grows as much as it should, these techniques may become necessary on the frontend side. There are also js frameworks capable of navigating extremely large lists (100,000 rows with 22 columns in that example). At least for the Open side of the question, it would be better to prepare for healthy growth, rather than try to constrain the space with custom conventions. The condensed version of the table @NSoiffer is suggesting is starting to read like a math grammar to me. I hope that remains out of scope for the list pages, as it changes their character and makes it harder to contribute new concepts, as we no longer have a uniform organization. Btw, one design philosophy we have written down are the Guidelines for core list curation. Note item 4. |
yes some possibly slightly more formalised version of that is the plan (think) Currently the issue is I think mostly about re-organising the yaml input (to make it possibly easier for implemntations to deal with similar concepts with shared code) However you are correct the html display may also become an issue
Yes the current display is very minimalist. On the other hand pagination is possibly less needed than it was, eg in previous iterations we always had the mathml spec split by chapter as the whole thing was too big to load in practice, but these days loading the whole spec isn't really an issue at all. But jekyll does have some built in pagination features we could invoke without having to change the build too much if that does prove to be an issue in the open list (I can't see it being needed for core list) |
@NSoiffer no PR yet but made a start https://davidcarlisle.github.io/mathml-docs/intent-core-concepts/#default-large-operator-concepts source diff w3c/mathml-docs@main...davidcarlisle:mathml-docs:main Currently it pulls the not sure yet how best to resolve conflicts with infix ops eg https://davidcarlisle.github.io/mathml-docs/intent-core-concepts/#union is currently double defined, the one that works actual goes to the default infix fixity list, not the largeop. We could have the default infix line for |
the existing comments in the you could use the 0-arity form on the mo
sum 1-arity form for implict sum sum of f of x 2-arity form for sum over a range sum over R of f of .. 3-arity form for sum between limits sum from a to b of f of .. But that doesn't really leave any good way to mark up the summation without the summand expression Perhaps this?
sum from a to b or in principle we could give a different interpretation for the arguments in the 2-arity form with a different property so
sum from a to b |
One additional markup variant that has been discussed in general and fits with David's sum examples is using a "higher-order" application. There the summation is first attached to its indexing signature. And then, a level up, is applied to an argument. In the process of writing my example I also noticed that summations typically have an explicit indexing variable, which is also used in the argument being summed over. And wondered if that is motivation to reuse the same intent concept <mrow intent="$sum_op($arg)">
<munderover arg="sum_op" intent="index($op, in($index_var, interval($from,$to)))">
<mo intent="sum" arg="op">∑</mo>
<mrow>
<mi arg="index_var">i</mi>
<mo>=</mo>
<mi arg="from">a</mi>
</mrow>
<mi arg="to">b</mi>
</munderover>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow> Some of that was discussed in #454. This is one of the cases where one can really feel that intent has a compact syntax. There are various extensions that jump to mind to tidy up and clearly mark the different kinds of arguments. The way I wrote the intent expressions above backs out of any assumptions about "sum" being a known operator, and ought to be usable even without custom conventions. But it gets verbose and functional, much more so than a convention-based Here are two additional examples, showing how the markup changes when <mrow intent="$sum_op($arg)">
<munder arg="sum_op" intent="index($op, $index_var))">
<mo intent="sum" arg="op">∑</mo>
<mi arg="index_var">i</mi>
</munder>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow> <mrow intent="$sum_op($arg)">
<mo intent="sum" arg="sum_op">∑</mo>
<mrow arg="arg">
<mi>f</mi><mo>(</mo>
<msub intent="index(x,$index_var)">
<mi>x</mi>
<mi arg="index_var">i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow> To summarize: A sum may be indexed or bare, and its indexing variable may be constrained by a range or bare. And each of those cases may benefit from specialized speech. |
@davidcarlisle : what you did is not bad. I started to reply with a suggestion and then realized that this is not needed at all: we have Doing this is a much simpler approach than the other suggestions in this issue. And the only changes that need to be made to the core concept list are to remove sum, product, and integral and any other large op that we in the list (from the distant past). |
Indeed, one choice here is deciding whether we want "hopefully" (Core list) or "certainly" (Open list) for large operators. |
@NSoiffer oh |
Apologies. I'm missing your point other than my use of the word "hopefully" (AT SHOULD use the core list, but it doesn't have to, hence "hopefully"). I don't understand '"certainly" (Open list) for large operators'. |
@NSoiffer given we have We could delete the "full" entries and compress to a |
Sure, I can elaborate: We expanded on this during the meeting yesterday. There is a large variability on how the indexing variable is presented, or omitted. Aspirational language I heard during the meeting described the In contrast - and this is generally true in the concept vs property approach - a concept-based intent expression adds more certainty that
(Of course at the cost of more verbose markup, as well as a successful parse of the argument structure already at the annotation stage.) |
@dginev I have some sympathy with your functional intent version that is basically identifying But actually we found that most of the time the extra mathematical precision didn't really help (except for mapping to computational systems) you normally want to read the upper and lower limit form just as it appears so "sum from i = a; to b" not as the mathematically equivalent "sum over i in the closed interval a b" which is what I think you'd get from
|
@davidcarlisle My example has three mistakes if it was aiming for Core actually:
And the baseline readout for Another point is that we have freedom to choose concrete readouts when a "concept-based" approach feels too artificial (which is what I think your worry is with "interval" ?). As with:
|
@dginev my main worry is that I think it highly unlikely that anyone seeing This used to come up quite often trying to typeset OpenMath where the underlying markup just had the sum over a set, but to get reasonable readings we had to "spot" common cases such as the set being an interval and express it as an upper and lower limit. So while I think the functional intent that you showed is correct, I think it's hard to generate and produces a less natural reading than the default. |
I am trying to follow this thread with an eye to a key purpose of intent: disambiguating how to Can someone please provide examples that are claimed to be ambiguous? I understand that the author may wish to suggest a particular pronunciation among |
@davidfarmer I don't think there is any question of ambiguity here, just giving the system a hint to use |
@davidfarmer for ambiguity it is probably quickest to reach towards Type theory. An example is the sigma type, where they write I am not at all certain what is the preferred pronunciation, but I could explain it as "the sigma type from A to B, dependent on x". I would have to reach out to a type theorist to find a preferred readout. I have seen capital sigmas used for various other purposes in arXiv - such as a type name variable, or a group invariant. I'm sure there are others... |
Shouldn't AT know that ∑i is not pronounced "summation symbol with subscript i", I still don't get the need to put the |
@davidfarmer The need I see is that if I have materials where I want a sigma pronounced as a summation, materials where I want it pronounced as "sigma type" and materials where they are pronounced as simply "sigma" (e.g. for the group invariants), then I should be able to guide AT to produce the correct outcome, by supplying the intended concept name. |
well it won't unless someone specifies it should. But it's the same as The proposal is that we have the https://w3c.github.io/mathml-docs/intent-core-properties/#prop-largeop It is defined by example but the basic idea of the property is "read it like summation" so naturally if you use |
I think that the dependent types usually use a summation rather than a sigma (even when called sigma types) the arXiv pdf you linked to seemed to be doing that. Of course you may still want to use intent to force correct speech even when the "wrong" Unicode character is used. |
Maybe I understand now: The proposal is that we can put But I don't need to put If I have that correct now, it seems reasonable to me. |
Disambiguation isn't the only (or even main) use of |
@davidfarmer: this ties in to #433, which we briefly discussed on the call last week as my "remaining big issue". My proposal is that if someone uses the I think the way to flush out my proposal is to explicitly say notation X maps to concept or property Y. In this case, it should say those constructs should act as if the There is also |
I was reading arXiv:2408.17016 today, and realized we can still benefit from establishing the concept names for marking an operation that "acts on a range of values", beyond the single large characters. Maybe aim to make a better version of my first attempt above. The example in that paper (in Section 2.1) is: Which I would informally describe as "the set of values x sub i, for i ranging from 1 to n". Which is a set constructor operation over a sequence of samples. Should I open a new issue for "operators acting on a range of values" or is there an easy connected resolution as we close here? Edit: The paper also has a similar example for defining a matrix, in Section 4.2, written: |
On the one hand, these notations don't seem like they need a solution in core. On the other hand, it seems like the existing core solution of
This should get a reading like "set x sub i from i = 1 to n" which is close to what you think is appropriate. A similar thing would be true for the second example if you labeled the bracketed quantity with "matrix". Note that |
That does appear to be a nice fit. Seen with these fenced It would be similar to having had the Since we are discussing a general form of indexing, I searched around, and found a well-structured writeup on "index notation" as also applied to the large operators here. Its formal definition in 2.2 discusses a generalized "index operator", of which summation is one example. Index Notation in Mathematics and Modeling I wonder if |
I (think) this is also intended for The relevant TeX names are On balance I'd stick with |
@davidcarlisle I would also be happy with settling on some concept names to annotate an indexed expression, which can be associated with the from/to speech, as with |
This was discussed at the meeting on 19-12-2024. The group's feeling is to leave the name as
However, other suggested names like |
This scratched at an old memory, since of course you are classically correct, but I had seen some uncountable use in the past. Not my field, but PlanetMath has one example definition in uncountable sums of positive numbers. So at least there is some precedent that the "index notation" is in use for uncountable sets. But that's a bit of an aside, since the "accessible outcome" for speaking the notation wouldn't necessarily depend on the mathematical structure of the index set. |
There are about 10 large operators that probably make sense to go into core (maybe integral, double integral, triple integral, contour integral, surface integral, volume integral, sum, product, coproduct, union, intersection). Likely what result is decided for core should be extended to open for the other large operators (e.g., ⊍).
These are all very similar in structure in that
intent
potentially goes onmsub
/munder
with one argument (typically specifying a domain for the "index") andmsubsup
/munderover
with two arguments ("... from xxx to yyy"). Or they go on some containingmrow
with an additional argument (e.g., "... from xxx to yyy of zzz"). If they go on one of the scripting elements, then there is no need for intents for indefinite integration or sums that don't have limits. If they go on anmrow
, then maybe it makes sense to have an intent for them although Neil felt the speech needs no intent because there is no other sensible speech for "integral", "sum", etc.In the Dec 21 meeting, no one stood up for the "dx" being part of the argument for integral as it would be spoken "dx" wherever it was and didn't need help from an intent.
In the meeting, Neil felt that listing these all out both uses up a lot space (and hence appears complicated) and more importantly, obscures their similarity making it harder on both generators and consumers of the spec. His suggestion is to create another list between the "Core Concept Default Fixity properties" and the "Core Concept Templates". Others were not enthusiastic with that idea.
This issue provides a place to discuss the pros and cons of how intents for large operators should be handled.
The text was updated successfully, but these errors were encountered: