Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching or matching text #220

Open
dginev opened this issue Feb 7, 2024 · 3 comments
Open

Searching or matching text #220

dginev opened this issue Feb 7, 2024 · 3 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@dginev
Copy link

dginev commented Feb 7, 2024

In #219 , the reply states:

If the spec (or its implementation) allows searching or matching of text, including syntax and identifiers understand the implications of normalisation, case folding, etc. Also check the detailed guidance for Text-processing.

The spec does not address the searching or matching of text.

But maybe it one day should? Would knowing answers to questions such as:

  • Should the ascii query "-x2" match against
      <mrow>
        <mo>&minus;</mo>
        <msup><mi>x</mi><mn>2</mn></msup>
      </mrow>
  • Should the ascii query "1.2" match against a <mn>1,2</mn> in a page with lang attribute indicating a country using a decimal comma? And vice versa for the query 1,2 against <mn>1.2</mn> in a page with lang attribute suggestive of a decimal point? Or neither, or both?
  • Should searching for a fenced expression, as with the ascii query (x,y) match against equivalent notations say with lang="fr", such as ]x,y[ ? Probably a good place to say "no" and draw the line of what is within reasonable expectations...

Would adding some text discussing these aspects be generally helpful for better math i18n in the ctrl+f "page search" of browsers?

@NSoiffer
Copy link
Contributor

NSoiffer commented Feb 8, 2024

I think these are good questions, but ones that are outside of MathML. Taking each bullet point in turn:

  • If someone types "-x2", then some processor will need to turn that into MathML. That means there is some grammar it uses, so whether it matches what you wrote or whether the '2' is a subscript or whether it is part of the 'x' (i.e, the identifier x2) would be based on the grammar.
  • I could be wrong, but I'd be surprised if searching for "1.2" in a text webpage matches "1,2". I don't see why math would be different.
  • Here, I would say "no". If you are doing a syntactic search (which is what happens if you browser search now), it would be wrong to do the match. If one had a semantic search, then the answer would be yes. But as with the first answer, that's not in the hands of MathML but in the hands of the search tools.

@aphillips aphillips added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Feb 12, 2024
@aphillips
Copy link

(This issue was linked from your I18N self-review. Adding a tracker label so that the I18N WG can track this discussion if it evolves. This in no way implies anything about I18N WG's opinion about the issue)

@dginev
Copy link
Author

dginev commented Feb 12, 2024

If someone types "-x2", then some processor will need to turn that into MathML.

I think it is the other way around: current ctrl+f implementations map the HTML document to some plain text representation and do literal string search into that (I've seen mentions of using Boyer-Moore). But maybe that is non-standard per the HTML find-in-page, which seems to only loosely mention "page contents":

The user-agent processes page contents for a given query, and identifies zero or more matches, which are content ranges that satisfy the user query.

I would be surprised if there are plans to implement tree search for MathML in browsers, do we have reasons to expect that?


I was inspired to think of -x2 by our issue #70 where U+002D - is proposed to be rendered as U+2212 .

If that is standardized, it seems very sensible to next expect that searching for U+002D will also match both U+002D and U+2212. FWIW, currently that appears to be the behavior in Firefox, but not in Chrome (where searching for U+002D matches itself, but not U+2212).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

3 participants