-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ignoring HTML (inline) in alt
on image
#716
Comments
Note: the spec doesn't mandate any particular treatment of the "image description." It only recommends:
So a fully compliant implementation is free to make a different decision.
cmark, the reference implementation, renders your example thus: <p><img src="#" alt="a b c d <e>f</e> & - g" /></p> I'm not sure what implementation preserved the |
Everyone that follows CommonMark: |
Could we improve the recommendation to explain the term “plain string” in a way that represents how several markdown parsers work? |
OK, I see your issue is not really about whether the As I mentioned, the spec is really silent on this -- the examples in this section just embody a recommendation -- so you're free to omit the raw HTML when generating the I'd be okay with changes to the spec along these lines, I think, as long as the reference implementations were also updated to match. Note, however, that such a change would be an annoyance to all current implementers, who would find their spec tests failing and need to adjust things for a case that is probably pretty rare. |
This is coming up several times with several people. One idea to clarify this is to mark test cases as illustrative/optional/recommendations with some attribute.
This to me sounds like the reason for CommonMark to exist and have a set of test cases that people pull in to test their parsers with. Much of markdown is edge cases. |
If you're up for making changes to spec, commonmark.js, and cmark, then you've got the green light! |
While I understand the question, I am hoping to contribute solely to the spec. |
The thing is, if you contribute in this way to the spec, then I have to modify the reference implementations, and that takes time. I don't like them to get out of sync, and I don't have time to spend on this right now. You can keep this issue open if you like. |
Right, that’s quite fair. |
As someone who maintains a compliant parser and doesn't count C as a language I understand well, I personally find the JavaScript implementation to be extremely helpful in understanding the impact of spec changes. But I also know the burden of maintaining multiple projects too, so while I'd be bummed to lose the reference parser, it would be understandable. |
Are there any updates on this? If not, can we at least get a clarification on what the expected behavior should be?
The result of updating the spec is a one-time change for some implementors. The result of not updating it is the continued stream of issues regarding the inconsistency between parsers. Latter is arguably more annoying. |
What is the precise change to the spec you think should be made? |
Just a general overview here. I'll try to suggest precise changes in the next post. Option 1 - user friendlyMake it so image
This seems to be what users expect. This also has usability issues, because literal From implementation point of view it's also unclear:
Option 2 - keep existing behaviorWe now have 3 choices of what to do with
Need to decide which one is correct. Note that some implementations have an option to disable html rule entirely (at least we do). Having different results based on whether parser is able to parse html might be undesirable. |
In the OP I discussed these choices too, and I advised going with existing behavior, but not emitting html tags. Which is like opt 2 haskell |
@jgm, now actually answering your question:
Maybe spec should specify how to transform AST into "plain string content" for the purposes of forming image alt:
The last rule makes sense for me personally (since it leads to the same behavior whether html rule exists or not). But maybe serializing htmls into an empty string is more "correct" in theory. Also, I'd like to add a test to the spec along these lines: If you use special symbols in image alt, you can wrap them into code span:
![`*em* <link>`]() It's not going to fail anywhere (all parsers keep contents of code span as is hopefully), but it may be a useful suggestion for markdown writers (prettier/prettier#15140) on how to deal with special characters. (or mention in any other way that automated software should escape user content inside image alt when auto-generating markdown) |
CommonMark prescribes that markdown is interpreted, but corresponding tags not output, in
alt
on<img>
:(see also some more info in that section).
To paint a more illustrative picture of this, and introduce the problem:
->
I see no good reason that actual HTML is used, while html-from-markdown is ignored.
I find that there is something to say for not doing this at all: for
a *b* c
, maybe the user actually wanted the asterisks in the alt.However, that’s probably too much of a breaking change, and maybe this is fine.
And there is something to say for doing it for everything (including actual HTML), that
a <em>b</em> c
is consistent toa *b* c
and turns intoa b c
. I believe this to be the right call, and hence this issue.Perhaps of note: HTML does not work in
alt
. Neither tags, nor comments, nor instructions, nothing. https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state.I can do the work.
The text was updated successfully, but these errors were encountered: