Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-breaking-space in subscripts #1688

Open
chtenb opened this issue Mar 5, 2023 · 13 comments
Open

Support non-breaking-space in subscripts #1688

chtenb opened this issue Mar 5, 2023 · 13 comments

Comments

@chtenb
Copy link

chtenb commented Mar 5, 2023

Origin: asciidoctor/asciidoctor#3951

Would it be possible to support the non-breaking-space character in a subscript? I'm in a situation where I make heavy use of subscripts with spaces, and I want to make my source text look as close to the intentional rendering as possible. Having {sp} littered all over the text conflicts with this idea.

However, I have no problem using unicode in my text, and since a non-breaking-space looks fine in plain text too, I figure that would be a nice feature to have, and solve my usecase.

Asciidoctor seems to already support this, but asciidoctor.js does not.

@mojavelinux
Copy link
Member

Please provide an example of what does not work. Asciidoctor and Asciidoctor.js use exactly the same code. Unless there is somehow a problem with the transpiler (which I doubt), I expect it to work the same way in both.

@chtenb
Copy link
Author

chtenb commented Mar 5, 2023

Here is an example

Test~a1 a2~

Note that the space between a1 and a2 is a non-breaking space.

@mojavelinux
Copy link
Member

mojavelinux commented Mar 5, 2023

Much to my surprise, there does seem to be a difference in behavior between Asciidoctor and Asciidoctor.js here. It looks like there's a mismatch between the interpretation of the regular expression character class \S.

Ruby:

'\u00a0'.match? /\S/
# => true
/\S/.test('\u00a0')
// => false

Thus, Asciidoctor.js would need to add a check for this character to match the behavior of Asciidoctor.

In AsciiDoc, no-break space should not be considered a space character. (In Ruby, \S is defined as [^ \t\r\n\f\v]. Though we don't expect to find \r, \f, and \v in a document—at least not where markup is interpreted—so it's effectively [^ \t\r\n]).

@chtenb
Copy link
Author

chtenb commented Mar 6, 2023

According to the JS documentation at

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes

\S is defined as [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

@chtenb
Copy link
Author

chtenb commented Mar 6, 2023

Perhaps the asciidoc code could refrain from using the \S and \s classes at all, and instead use explicit character classes?

@mojavelinux
Copy link
Member

We plan on defining spacing characters more clearly in the specification. To solve the issue at hand, I would focus on just the regexp in question.

@chtenb
Copy link
Author

chtenb commented Mar 6, 2023

I'm happy to provide a PR, but I'm unfamiliar with this codebase. Do you have any pointers to where this regexp is located?

@mojavelinux
Copy link
Member

It's a complex issue because the regexp is defined in Asciidoctor Ruby. Anytime there are differences in the interpretation of regexp, it requires adding a condition in Asciidoctor Ruby that tells the transpiler which alternative to use. It's not a small change and requires coordination between Guillaume and I. And right now, we're both very busy with other things.

You say you have to literal your document with {sp} without this. Can you actually provide a defensible use case for when subscript and superscript require spaces? One of the key reasons the language doesn't allow spaces (or some other way to express it) is because I've never been able to find a common case when it's truly needed.

@chtenb
Copy link
Author

chtenb commented Mar 6, 2023

It's a complex issue because the regexp is defined in Asciidoctor Ruby.

Wouldn't it simply be a matter of replacing \S with [^ \t\r\n] in the regexp in question in the Ruby source? That seems like a portable way to do it.

Can you actually provide a defensible use case for when subscript and superscript require spaces?

I've started using asciidoc for some simple mathematics that I want to write in my texteditor. I explicitly do not want to use Latex (and the stem block for that matter), because I want the notation to work well in plaintext, such that it is easily readable from your texteditor and shareable across text-only channels, without looking clunky. Hence I use unicode in combination with some table blocks and super/subscript.

See this for an example: https://chtenb.dev/?page=cat
With the plain text source being: https://raw.githubusercontent.com/chtenb/chtenb.github.io/master/blog/cat.adoc

You could argue that asciidoc is not the right tool for the job, but it seems to cover my needs quite well at the moment, except for this little issue. Moreover, there does not seem to be an alternative that suits this usecase better.

@mojavelinux
Copy link
Member

mojavelinux commented Mar 6, 2023 via email

@chtenb
Copy link
Author

chtenb commented Mar 6, 2023

This is asking the syntax to do what it was not designed to do.

I'm not sure what you mean by this, since you stated that the current behavior of asciidoctor.js regarding non-breaking spaces and subscripts is in fact a bug. It's hard to defend against authors/maintainers saying you're using their tool wrong. But it's also not very useful if there is no obvious alternative :)

I completely understand other people don't have time to solve my issues, which is why I offered to provide a PR. But if the codebase/project infrastructure is too complex to handle for a layman this is not a viable course of action.

Mainaining a separate fork is too much of a hassle for me. I will implement a workaround that replaces non-breaking-spaces with some uncommon unicode character before passing the asciidoc source to asciidoctor.js, and reverse the replacement in the generated html. That seems to work well enough.

Thanks so far for the quick replies and the swift diagnosis!

@mojavelinux
Copy link
Member

since you stated that the current behavior of asciidoctor.js regarding non-breaking spaces and subscripts is in fact a bug.

First of all, I stated that it's a difference from Asciidoctor in regard to whether you can make use of a workaround. Since AsciiDoc is ambiguous about what accounts for a spacing characters right now, it doesn't warrant calling it a bug. It's an idiosyncrasy at best. (The very type of idiosyncrasy we are working to address in the specification).

Now that I understand what you're trying to use the workaround for, I don't agree it's the right thing to do. Non-break space has a different meaning than space in the layout and it thus changes how the text is arranged. I had only suggested it as a quick workaround that you could use in Asciidoctor Ruby, but since there's an unspecified difference in Asciidoctor.js, that workaround is not available there. If you need a space in superscript or subscript, it must be written as {sp} to be portable.

In AsciiDoc, superscript and subscript are not designed to be used for elaborate STEM expressions. That's not what the language was designed to do. They are for simple uses of superscript and subscript (such as H2O) and for notes, such as textcitation needed). Anything beyond that warrants the use of the STEM support / macro.

I will keep this discussion in mind when working on the specification, but I'm not going to spend any more time on this issue now because, as it stands, it's unspecified behavior. Asciidoctor Ruby and Asciidoctor.js are the way that are and are passing the tests we wrote for the specified behavior.

@mojavelinux
Copy link
Member

Mainaining a separate fork is too much of a hassle for me. I will implement a workaround that replaces non-breaking-spaces with some uncommon unicode character before passing the asciidoc source to asciidoctor.js, and reverse the replacement in the generated html. That seems to work well enough.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants