Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code in <pre> elements breaks things #613

Closed
miken32 opened this issue Jan 20, 2017 · 25 comments
Closed

Code in <pre> elements breaks things #613

miken32 opened this issue Jan 20, 2017 · 25 comments
Milestone

Comments

@miken32
Copy link

miken32 commented Jan 20, 2017

I have the following in a post:

Some plain text.
<pre lang="bash">
#!/bin/bash
tee /tmp/foo<<'EOF'
bar
EOF
</pre>
More plain text.

This breaks the resulting AMP HTML; which ends up looking like this:

Some plain text.

#!/bin/bash
tee /tmp/foo
More plain text.

You can see this demonstrated here: https://mike.eire.ca/2016/07/08/running-freepbx-with-nginx/amp/

@miken32
Copy link
Author

miken32 commented Jan 20, 2017

Note I am using the WP-Syntax plugin. I tried disabling it, but it seems to have no bearing on this problem.

@amedina
Copy link
Member

amedina commented Jul 6, 2017

The linked post is validating properly with the <pre> tags.

@amedina amedina closed this as completed Jul 6, 2017
@miken32
Copy link
Author

miken32 commented Jul 6, 2017

This is not about validation. Did you look at the post in AMP and regular versions?

@amedina
Copy link
Member

amedina commented Jul 6, 2017

Oh, I see; I thought that the "breaking HTML" was a validation issue. Will look into this in more detail.

@amedina amedina reopened this Jul 6, 2017
@miken32
Copy link
Author

miken32 commented Jul 6, 2017

Sorry I didn’t describe it better. Seems like the shell redirect << is treated like an element opening.

@amedina
Copy link
Member

amedina commented Jul 6, 2017

Yeah; something is weird. I spent some time looking at the issue and the '<<' was indeed breaking the HTML as you observed. It seemed the problem is with DOMDocument; when we do get_content_from_dom(), the call to $dom->saveXML() seems to be the one choking on the '<<'. The weird part is that I switched to work on some theme-related stuff, and when I tried this again it seems to work (see attachment). Looking into it more.

Are you using the plugin version from Github or the one from Wordpress.org?
screen shot 2017-07-06 at 2 21 59 pm

@miken32
Copy link
Author

miken32 commented Jul 6, 2017

I’m installed via Wordpress, but if you give me a patch I’m happy to give it a test if needed.

@amedina
Copy link
Member

amedina commented Jul 6, 2017

A quick check you could do is to download the plugin from GitHub and put it into your plugins directory; change the name of the plugin in amp.php so that it appears as a different plugin. Activate it and check your post. If that works, then the issue is the with the version of in Wordpress.org; we need to address this: #720

@amedina
Copy link
Member

amedina commented Aug 11, 2017

Plugin has been updated!

@amedina amedina closed this as completed Aug 11, 2017
@miken32
Copy link
Author

miken32 commented Aug 14, 2017

Just updated through Wordpress, still having the same problem. This is on version 0.5.

@amedina
Copy link
Member

amedina commented Aug 14, 2017

Hi @miken32, I am testing a post with that sample <pre>-formatted text and it seems to work: https://amptest.wordpress.com/2017/08/14/formatted-text/amp/
I am looking into it to see what else could be happening.

@miken32
Copy link
Author

miken32 commented Aug 14, 2017

Well I've confirmed it's not cached by updating the post, and syntax highlighting is still disabled.

I put up a short test post with this content:

Some plain text.
<pre lang="bash">
#!/bin/bash
tee /tmp/foo<<'EOF'
bar
EOF
</pre>
More plain text.

@amedina
Copy link
Member

amedina commented Aug 14, 2017

Got it; don't know what is happening. Re-opening the issue to track the problem.

@amedina amedina reopened this Aug 14, 2017
@westonruter
Copy link
Member

Without AMP it get's output as:

<div class="wp_syntax" style="position:relative;"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/bash</span>
<span style="color: #c20cb9; font-weight: bold;">tee</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #cc0000; font-style: italic;">&lt;&lt;'EOF'
bar
EOF</span></pre></td></tr></table><p class="theCode" style="display:none;">#!/bin/bash
tee /tmp/foo&lt;&lt;'EOF'
bar
EOF</p></div>

With AMP active, however, it is output as:

<div class="wp_syntax">
<table><tr><td class="code"><pre class="bash amp-wp-inline-6e55cd5874e142e710268c63a0bf1d5b"><span class="amp-wp-inline-cfd229d4d7837d4ff4f01c73f4645495">#!/bin/bash</span>
<span class="amp-wp-inline-91a3a813b7013bff7d5e41b8b31980af">tee</span> <span class="amp-wp-inline-1e2d3daaad6014ca7a9a9dedd183a25c">/</span>tmp<span class="amp-wp-inline-1e2d3daaad6014ca7a9a9dedd183a25c">/</span>foo<span class="amp-wp-inline-26fd8fecc327ebd0565339da3370cc99">&lt;&lt;'EOF'
bar
EOF</span></pre></td></tr></table>
<p class="theCode">#!/bin/bash
tee /tmp/foo&lt;&lt;'EOF'
bar
EOF</p>
</div>

Notice how the display:none; is getting stripped out and showing:

image

@westonruter
Copy link
Member

westonruter commented Mar 7, 2018

The problem is the use of safecss_filter_attr() in https://github.com/Automattic/amp-wp/blob/23eced628815e169ca9831b2ccb460b49adc53c1/includes/sanitizers/class-amp-style-sanitizer.php#L456

This is what is removing the display property. It was introduced in de2a303 with the comment:

Filter the styles using safecss_filter_attr() to make sure there
aren't any funky properties. Also alphabetize the valid properties so
that variable order doesn't cause unnecessary duplication.

The problem is that safecss_filter_attr() has a whitelist of the allowed properties, and this is the aspect of the function that needs to be disabled. Fortunately this is trivial to achieve by modifying:

https://github.com/Automattic/amp-wp/blob/23eced628815e169ca9831b2ccb460b49adc53c1/includes/sanitizers/class-amp-style-sanitizer.php#L456

To be like this:

add_filter( 'safe_style_css', '__return_empty_array', PHP_INT_MAX );
$string = safecss_filter_attr( esc_html( $string ) );
remove_filter( 'safe_style_css', '__return_empty_array', PHP_INT_MAX );

@westonruter
Copy link
Member

@coreymckrill Any objections?

Note that this would obsoleted by #610 and #930.

@miken32
Copy link
Author

miken32 commented Mar 7, 2018

My problem has nothing to do with CSS, unless you're talking about some internal intermediate rendering.

Without AMP:

<div class="entry-content">
		<p>Some plain text.</p>

<div class="wp_syntax" style="position:relative;"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/bash</span>
<span style="color: #c20cb9; font-weight: bold;">tee</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #cc0000; font-style: italic;">&lt;&lt;'EOF'
bar
EOF</span></pre></td></tr></table><p class="theCode" style="display:none;">#!/bin/bash
tee /tmp/foo&lt;&lt;'EOF'
bar
EOF</p></div>

<p>More plain text.</p>
	</div><!-- .entry-content -->

AMP:

<div class="amp-wp-article-content">
		<p>Some plain text.</p>
<pre lang="bash">&#13;
#!/bin/bash&#13;
tee /tmp/foo
<p>More plain text.</p>
</pre>	</div>

@westonruter
Copy link
Member

I was testing specifically in the 0.7-beta version with for a theme that has amp theme support, so I'm seeing a different issue than you are.

If I test in 0.6 with in the existing post templates I do see what you do:

image

The problem is that WP_Syntax::beforeFilter() has a somewhat strange condition in it to not do the replacement if did_action( 'wp_print_scripts' ) returns false. The result is a lack of any syntax highlighting, which I don't think is what you want.

If I change that line to:

if ( did_action( 'wp_print_scripts' ) || ( function_exists( 'is_amp_endpoint' ) && is_amp_endpoint() ) ) {

Then it outputs:

image

Here you get styling, but you can see it reproduces the issue I noted above with regard to the display property.

So I think there are two fixes: (1) WP Syntax plugin needs to fix the condition for whether to do the parsing, and (2) the AMP plugin needs to fix how it handles the display property.

In regards specifically to << being stripped, I think this is not actually a bug, but is to be expected. The content should be using &lt; instead of < in the markup. On a HTML5 document if you try validating the above text you get errors:

So it should be:

Some plain text.
<pre lang="bash">
#!/bin/bash
tee /tmp/foo&lt;&lt;'EOF'
bar
EOF
</pre>
More plain text.

It may be an accident that << come comes as << in your browser. In other browsers with different HTML parsers it may be interpreted as an < tag and be hidden entirely. It should be encoded in the post content so that if the WP Syntax plugin is disabled, or it isn't running (as in the case of AMP here) the content still displays properly but just without styling.

@miken32
Copy link
Author

miken32 commented Mar 7, 2018

There is no raw << in my content, as you can see in the HTML output posted above. I think it's in the initial report because GitHub's Markdown parsed &lt; to <

@westonruter
Copy link
Member

I see raw << in #613 (comment):

image

@westonruter
Copy link
Member

If I de-activate the WP Syntax plugin and I enter:

<pre lang="bash">
#!/bin/bash
tee /tmp/foo&lt;&lt;'EOF'
bar
EOF
</pre>

Then in AMP and non-AMP alike I see what is expected:

image

@coreymckrill
Copy link
Contributor

@westonruter Been a long time since I dug into this. What would be the security implications of not running inline style declarations through a safelist? It seems like that could lead to arbitrary strings being output in a style block. Could the bug be fixed by adding more allowed attributes instead of causing it to skip the process entirely?

@miken32
Copy link
Author

miken32 commented Mar 7, 2018

Yeah there is a raw < in the post; if I put in &lt; into the post then AMP works, but the regular post shows the entity. :|

@westonruter
Copy link
Member

@miken32 do you have the WP Syntax plugin active? It needs to be fixed if it is going to be used here.

What would be the security implications of not running inline style declarations through a safelist? It seems like that could lead to arbitrary strings being output in a style block. Could the bug be fixed by adding more allowed attributes instead of causing it to skip the process entirely?

@coreymckrill The logic in safecss_filter_attr() is already being run on post save to strip out display, if the user doesn't have unfiltered_html. So this is preventing safecss_filter_attr() from being called a second time when displaying the post. Otherwise, if a user does have unfiltered_html then they currently aren't able to do so in AMP even in spite of the sanitizer. So I don't see there being a security implication here.

@coreymckrill
Copy link
Contributor

@westonruter Ah, ok. No objections, then. 👍

@westonruter westonruter added this to the v0.7 milestone Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants