-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding semantics to presentation MathML using symbol names #141
Comments
It might be worth cross referencing against the OpenMath list especially as all the Content MathML element names are already cross referenced to OM. |
I have examined an introductory calculus textbook (Active Calculus,
by Matt Boelkins) with the goal of determining what is required to
make all the math markup unambiguous and semantic.
A draft of my report is here:
https://docs.google.com/document/d/1cZnff5_fi_ucNyZ1ex2msmJLE55FAZD-QInkLYe8xiE
|
@samdooley: that's a long list, so thanks for all the effort and getting the ball moving along!!! My comments:
I hope these comments are helpful to start a discussion on your list. |
@davidfarmer -- thanks for the list. It seems there are a few things that break "the Soiffer hypothesis", but not many. If the computer could know what the functions are, then distinguishing between function application and multiplication would not be needed. But doing that requires reading the text and can't really be known by just knowing the subject area. So putting that aside (which I don't really thinks breaks my hypothesis), I see the following as problematic:
Is there more to add to that list? I don't think it is hard to distinguish between definite and indefinite derivatives, but maybe that's just me as @samdooley makes a distinction in his list (but he also calls out more distinctions). Note: probably most people in this project can read TeX, but I strongly suspect some people have trouble reading it. For your examples, it would probably be helpful if you included images showing the notation in 2D form. |
I think the lists are a useful starting point for assigning roles, although I'm a bit confused about the TeX-centred description. I don't think we should be specifying a TeX syntax in this group. We should be assigning roles to use on mathml elements. Individual systems or individual users can define tex macros to produce that markup, but as that is just surface syntax that's expanded out by tex or javascript or whatever, I'm not sure it need be standardised. I'd agree with Neil that integral forms can be distinguished by the presence of limits (that is, the integral operator is wrapped in msubsup or msub) so I'm not sure that more specific roles are needed for integrals. Also in @davidfarmer's list I'm slightly sceptical that authors will want to use prefix forms for invisible times and function application (content mathml as an author format suffers from this) since the presentation forms are infix, I think a tex infix markup like If we do use TeX markup for symbols in any descriptions I think we should use the unicode-math markup (as that works in tex) these are all listed in
and
in particular we shouldn't use commands like |
* I don't think you included f^(4)(x) which could potentially be confused with power. Especially if you wrote
`f^(n+1)(x). I suppose knowing that it is functional application would be a good clue that it wasn't power,
so maybe this isn't problematic...
There are macros
\nthDerivative
and
\functionPower .
You are correct that f^{(n+1)} could be ambiguous. I'll add it to
the writeup. (In Active Calculus, it always means derivative.)
|
Below is a proposal for how to reorganize Sam's tables. The underlying problem is similar to what one encounters when designing a database. "multiplication" can be represented by \cdot, \times, or [space] \times can mean "multiplication" or "cross product" Thus, we have a many-to-many relationship. An additional complication is that Sam wants to encode both the form and The conclusion I reach is that two attributes are needed: one that encodes meaning, So we need both the current "ID" column in Sam's table, and also another new In the HTML, the Meaning will be recorded in the 'role' or 'math-role' or some other @samdooley : I tried to email you, but it bounced. Has your email address changed? |
Trying to understand @samdooley' s spreadsheet before the call, we seem to have been talking round it for a while with I think two viewpoints leading to a certain amount of disconnect so I tried refactoring it I started by removing all rows that did not share an entry in the Unicode column (F) as they are uniquely identified by their presentation mathml markup, then further removed any rows if there were not multiple meanings after removal of synonyms such as ngt ; notgt, I then did a bit of hand cleanup and ended up with the 29 entries in the attached table (html but attached as .txt for this site) Note I blanked out all the units rows to a single UNITS row as I think we do need to specify some markup for (any) unit use. Note I think the data in the original table is needed, just not in the mathml spec, that is, if you are inferring semantics (or content mathml or openmath) from presentation and hit a U+222D then you need to know that's a triple integral. Sam's spreadsheet has that data and any convertor (in either direction) needs that information, which is where it came from:-) but I would argue that it should not be in the MathML you should only put a (math)-role="wibble" on an |
I just discovered this issue today and am particularly interested in @samdooley 's list:
It appears that Google doc has disappeared. Is there a new location? |
@samdooley: is the list still around so @dginev can look at it. If it isn't, please close this issue. |
No action, so closing issue. |
Several options for adding semantics to presentation markup were discussed on the Sep 10 MathML General call. A common thread seems to be a need for a shared vocabulary of mathematical symbols/operators/names.
https://docs.google.com/spreadsheets/d/1ebOkl7Gckfk5g6Dc4C8bpGZtSxLnGwpOHqAwwON0-nI/edit?usp=sharing
I have collected 1749 symbols into a Google sheet as initial starting point for such a list. The list still needs lots of work, but enough is there to illustrate how one could add semantic information via a role attribute to encode content markup within the presentation markup:
The goal for this list is to define unique short identifiers for as many mathematical symbols as possible from widely used sources, including Unicode, Content MathML, LaTeX, Nemeth Braille, and SI units.
These identifiers are intended to be suitable for use in as many markup contexts as possible, including presentation MathML role attributes, content MathML element names, LaTeX macro names, and JSON property names/values.
Each row in the table defines a single symbol, with its unique identifier (ID), a short description (Symbol), a mnemonic example (Example), and a Unicode character (Unicode).
The symbols are listed by type, which gives a rough classification of the symbols according to their syntactic form: symbol, operator, unit, function, large operator, special forms, fences, and scripts.
While the universe of math symbols is necessarily unbounded, this list should include the more common Unicode math symbols, the Content MathML 3.0 element names, the Nemeth braille patterns, and the more common SI units.
This first version is missing lots of symbols, and I know I need to check for coverage for content MathML elements, and braille patterns. But let me know if there are vocabularies that deserve special attention. Statistics, chemistry, and multi-variable calculus, for example, could clearly use some work, among others.
The text was updated successfully, but these errors were encountered: