-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML parser: don't alter raw HTML #10551
Conversation
Aside from proposed workaround, this feels like it could be classified as an upstream issue of
|
This solution looks like it goes someway towards achieving my requirements in #5123 |
Circling back, I looked at both the suggestions above. Using However, it does mean the same effect is applied to the Putting the change in I'm not sure if That is, if I could expect:
To return Then I could equally expect:
To return Changing |
I think it would be fine; preferred even. I don't love that the behavior becomes a bit less predictable in having the condition atop delegating to
I wouldn't say this is explicitly intentional, more an artifact of how it's internally implemented. |
7d46c89
to
7502650
Compare
Based on the feedback I've updated this PR to just check for As mentioned this does affect If this PR still makes sense to include I can also create a separate issue (here or in |
@@ -226,6 +226,10 @@ export function matcherFromSource( sourceConfig ) { | |||
* @return {*} Attribute value. | |||
*/ | |||
export function parseWithAttributeSchema( innerHTML, attributeSchema ) { | |||
if ( attributeSchema.source === 'html' && ! attributeSchema.selector ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this seems better in the matcherFromSource
function because there's already a switch case there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matcherFromSource
always gets fed into hpq
and will then get modified. The idea with the change is to bypass hpq
entirely, for which parseWithAttributeSchema
seems a good place:
export function parseWithAttributeSchema( innerHTML, attributeSchema ) {
if ( attributeSchema.source === 'html' && ! attributeSchema.selector ) {
return innerHTML;
}
return hpqParse( innerHTML, matcherFromSource( attributeSchema ) );
}
I'm still conflicted on this one, and am yet to reach some happy conclusion in my mind. The main hang-up for me would be the introduced unpredictability of the In order to keep this moving along, I think I'll be content with what's proposed here now. We should probably consider both of the prior feedback comments before approval though. |
This allows the HTML to be retained in the original format without processing and reformatting
7502650
to
ba9fd5c
Compare
A bit delayed, but I've rebased and #12610 removes the need for changes to the integration tests. Agreed that there doesn't appear a way around this using DOM apis, and building a parser for this doesn't seem like fun. |
@aduth What are your current thoughts here? Trying to see what's the best path forward for this PR? close/merge? |
I think it comes down to the question of whether we're happy to take a fix for one case, at the expense of introducing some inconsistency where it remains an issue for anything not falling into that condition (i.e. Personally I tend to rather avoid those in favor of a universally consistent fix, depending on the severity of the issue. Given how long this problem has existed, I don't know that it's an especially urgent one. With that in mind, I'd be more inclined to close, at least as it stands today. There's no great alternative, though it doesn't seem like an impossible problem to solve (mentioned in #10551 (comment)). (Given how I've contradicted my previous assessment in #10551 (comment), it should be clear that I'm quite on the fence with this one!) |
Sounds reasonable, and yeah no easy decision here. Let's close for the moment. |
Currently if you use an HTML block it will be parsed and reformatted when saved. This has the effect of transforming valid HTML such as:
<path d="M0,0h24v24H0V0z M0,0h24v24H0V0z" fill="none" />
Into:
<path d="M0,0h24v24H0V0z M0,0h24v24H0V0z" fill="none"></path>
Although technically correct it does mean the HTML changes on save, which is unexpected.
The reason for this is the HTML goes into
hpq
, which uses the browser andcreateHTMLDocument
. This converts the user HTML into a valid DOM tree, modifying as appropriate, expanding self-closing tags, and removing unnecessary whitespace.When the HTML is then returned it is the browser's version, and the original version is discarded. For most blocks this is fine as the HTML is generated by Gutenberg, but for the HTML block the HTML is generated by the user.
This is another follow-on from #10066, and in conjunction with #10474.
How has this been tested?
Additional unit test added. Manually test by creating a custom HTML block and adding:
<path d="M0,0h24v24H0V0z M0,0h24v24H0V0z" fill="none" />
Verify that on saving the post and reloading the page the block HTML remains the same.
Checklist: