-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parser] Handle ambiguous ampersands of arbitrary length (closes #1257) #2731
Conversation
source
Outdated
that for each <span>code point</span> in the <var data-x="temporary buffer">temporary | ||
buffer</var> (in the order they were added to the buffer) user agent must append the code point | ||
from the buffer to the current attribute's value if the character reference was <span | ||
data-x="charref-in-attribute">consumed as part of an attribute</span>, or emit a character token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"or emit the code point as a character token otherwise."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nit
Fixed |
Also added note that ambiguous ampersand state is optional for implementations that don't report errors. |
source
Outdated
@@ -104257,6 +104257,11 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> { | |||
|
|||
</dl> | |||
|
|||
<p class="note">For performance reasons, an implementation that does not report errors and that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have this for any other states that are optional for no-parse-error implementations. I'm not sure it's a good idea to have them; it could add confusion and in itself be a source of bugs (e.g. if the note says to switch to the wrong state). Better to make sure behavior is correct by writing tests in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have such note in tree construction stage about switch to RCDATA as far as I remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, it's about RAWTEXT and PLAINTEXT in https://html.spec.whatwg.org/multipage/syntax.html#html-fragment-parsing-algorithm
However, I agree with your argument, so I guess I'll better remove it.
Tests at html5lib/html5lib-tests#94 |
@zcorpan I've removed the note. |
As noted in html5lib/html5lib-tests#94 (comment) I think there's a spec bug in attribute values also. We could either fix it in this PR or do that separately in a followup. |
Testing |
@zcorpan Not quite sure about it to be honest, because by definition:
and we have a match here, but it's omitted due to the attribute quirks. |
Let's move discussion about ambiguous ampersand in attribute value to #2776 . This PR is good as-is, just need to fix the test to match. |
@zcorpan Tests were merged, so I guess we're good to go. |
Simon has gone on vacation, but it does look like everything is in order, so let me help merge this. |
Shame on me, how come I didn't see this PR. Looks good though! |
This implementation avoids infinite buffering of a possible ambiguous ampersand as discussed in #1257.
Preview: https://inikulin.github.io/html-build/output/multipage/syntax.html
POC: HTMLParseErrorWG/parse5#25