-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescaped HTML entities within <pre>
tags break DOM parsing
#3492
Comments
Is this actually considered valid HTML? If you try putting it into a Gutenberg block will it not get the entities encoded when serialized? Related: perhaps the Masterminds HTML5 parser (#3293) doesn't have this bug. Perhaps another reason to consider it. Otherwise, the only thing I can think of to fix this is to have some regex replacement run on the HTML before parsing, similar to how we are handling the AMP bind bracketed attribute syntax. |
I don't think this is valid HTML, however:
Right now, our additional processing for the AMP transformations turns this "sketchy" HTML into broken HTML that renders the page unusable. So, while I'm not saying we should turn invalid HTML into valid HTML by default (out of scope for the plugin), I do think we shouldn't make bad things worse, as people will blame the plugin, not the HTML. |
If I try creating the markup in Gutenberg via the Preformatted block I get this saved in the DB, with HTML entities: <!-- wp:preformatted -->
<pre class="wp-block-preformatted">Two roads diverged in a yellow wood,
And sorry I could not travel both (_/)
And be one traveler, long I stood (='.'=)
And looked down one as far as I could (")_(")
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim, |_/|
Because it was grassy and wanted wear; / @ @ \
Though as for that the passing there ( > º < ) Had worn them really about the same, `>>x<<´
/ O \
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
and here's a line of some really, really, really, really long text, just to see how it is handled and to find out how it overflows;</pre>
<!-- /wp:preformatted --> And the AMP and non-AMP versions are the same with no validation errors: So I think that this should be addressed, but since it is not something which can be reproduced with normal WordPress usage, I'd deem it low priority. I don't think the level of effort to hack support in PHP's |
Bug Description
As seen in #3403 (comment) , having
<
/>
characters inside of a<pre>
tag (which seems to be valid WordPress output) trips up the DOMDocument parser.I think this should probably be solved by escaping (at least some) HTML entities within
<pre>
tags.Expected Behaviour
The
<pre>
should work as expected and not result in broken markup after our transformations.Steps to reproduce
I could replicate it here: https://3v4l.org/kSM6Q
Screenshots
Markup:
Result:
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
Implementation brief
QA testing instructions
Demo
Changelog entry
The text was updated successfully, but these errors were encountered: