Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textual/TeX representations of Math Nodes #876

Open
rgieseke opened this issue Mar 24, 2021 · 1 comment
Open

Textual/TeX representations of Math Nodes #876

rgieseke opened this issue Mar 24, 2021 · 1 comment

Comments

@rgieseke
Copy link
Contributor

Following up from #872

I like the idea of capturing the alttext and agree that meta.altText is the best place for this for now. However, in the longer term it should probably live in a specific property (meta is really just a temporary dumping ground for properties that we don't have specific properties for. Some options could be

https://schema.org/speakable : although that seems to be intended for a URL or a 'content-locator' rather than the text itself
https://schema.org/alternativeHeadline: although not really an alternative title
or maybe creating a new alternativeText property which could also be used on CodeExpression, ImageObject etc

Happy to consider alternatives. If you would like to progress this further a PR to stencila/schema would be appreciated.

I think there are different representations to consider. In many workflows the MathML comes from TeX. This might be worth keeping around as the final rendering might happen with KaTeX/MathJax from the TeX (even though they might also use/create MathML).

LaTeXML includes alttext as an attribute with the original TeX like this:

<disp-formula id="S0.Ex1">
  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="a=b^{2}" display="block">
    <m:mrow>
      <m:mi>a</m:mi>
      <m:mo>=</m:mo>
      <m:msup>
        <m:mi>b</m:mi>
        <m:mn>2</m:mn>
      </m:msup>
    </m:mrow>
  </m:math>
</disp-formula>

Pandoc creates Jats-XML and includes a separate tex-math element:

<disp-formula>
<alternatives>
<tex-math><![CDATA[a = b^2]]></tex-math>
<mml:math display="block" xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></disp-formula>

In the JATS tag browser alt-text is described like this:

Accessibility: The short can be used for special accessibility display or presentation on graphic-limited websites or devices, as an alternative to providing the full graphic. (For example, the element is typically read by screen readers, and may also be used to display a few words “behind” a figure or graphic for devices with limited graphics capacity.)
Please reserve this tag for accessibility uses such as pronouncing screen readers. This element should not to be used as a replacement for , which is a visual element typically displayed alongside a figure, table, etc. The is not a visual element, unless the figure, caption, or other major element that holds the is not available or cannot be processed by the person or device-type being addressed. Since it is not visual, does not allow face markup inside it; a simplified textual alternative for a graphic object (including face markup) can be created using the element.

https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/alt-text.html

Maybe it's a possible approach for Stencila to include both representations as in the Pandoc output?
While TeX is probably actually quite readable for many people i guess alt-text is probably not fully the right place.

In general i think there is also a need for markup free representations of elements with math, e.g. in titles or abstracts which are used on the web where math rendering is not done.

@nokome
Copy link
Member

nokome commented Mar 25, 2021

Maybe it's a possible approach for Stencila to include both representations as in the Pandoc output?

Yes, this seems the most appropriate. And would not require any schema changes: in the jats codec. We are already encoding a <tex-math> element if the mathLanguage is tex.

/**
* Encode a Stencila `Math` node as a JATS `<inline-formula>` or
* `<display-formula>` element.
*/
function encodeMath(math: stencila.Math): xml.Element[] {
const { mathLanguage, text = '' } = math
let inner: xml.Element | undefined
if (mathLanguage === 'tex') {
inner = elem('tex-math', text)
} else {
try {
const root = xml.load(text)
if (root?.elements?.length) inner = root.elements[0]
} catch (error) {
log.error(`Error parsing MathML:\n${error.message}\n${text}`)
}
}
return [
elem(
math.type === 'MathFragment' ? 'inline-formula' : 'display-formula',
inner
),
]
}

So this issue would just require us to always encode both a <tex-math> and a <mml:math> (via conversion from the source to the other)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants