Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

dmsnell · 2022-11-04T05:44:02Z

What

Previously the Tag Processor class has performed unchecked arithmetic on the result of strpos() when looking to close out HTML comments and other special sections (CDATA, doctype-declaration). This led to a situation where by means of type-coercion we added an integer value to a false returned by the string function, setting the cursor in the tag processor to a low index in the document, and creating an infinite loop condition.

In this patch we're checking the results of calling strpos() in those places to avoid the type error and abort from the processor if we fail to find the end of those associated document sections.

Why?

We shouldn't trigger infinite loops 🙃

How?

Checking for proper types and error-return-values before assuming things worked the way we expect.

Testing Instructions

The following input should trigger the infinite loop in trunk

\n<div class=\"wp-block-group\"><!-- wp:quote ...

Note that the ... is part of the input that led to identifying this issue. This snippet was created when block-unaware code truncated post_content and cut inside the block comment delimiter, making it look like a normal HTML comment.

For a quick test, navigate to the Gutenberg directory and run the following script:

<?php

require_once 'lib/experimental/html/index.php';

$p = new WP_HTML_Tag_Processor( "\n<div class=\"wp-block-group\"><!-- wp:quote ..." );
$p->next_tag();
$p->next_tag();

In trunk this should hang, which you can confirm by instrumenting parse_next_tag() just inside the while ( true ) loop. In this branch the code should immediately complete, returning false from the second call to $p->next_tag().

The unit tests should also continue passing.

cc: @griffbrad

codesandbox · 2022-11-04T05:44:04Z

Open in CodeSandbox Web Editor | VS Code | VS Code Insiders

georgeh

Looks good, thanks for getting this out so quickly. Should we flag this for a point release of Gutenberg too?

noahtallen · 2022-11-04T18:33:02Z

Should we flag this for a point release of Gutenberg too?

I think it was only introduced in 14.5.0 RC (which is where we found it), so I don't think we need a point release of the main plugin

noahtallen · 2022-11-04T18:33:43Z

Will this also fix the issue we had with error logs getting polluted?

…ments. Previously the Tag Processor class has performed unchecked arithmetic on the result of `strpos()` when looking to close out HTML comments and other special sections (CDATA, doctype-declaration). This led to a situation where by means of type-coercion we added an integer value to a `false` returned by the string function, setting the cursor in the tag processor to a low index in the document, and creating an infinite loop condition. In this patch we're checking the results of calling `strpos()` in those places to avoid the type error and abort from the processor if we fail to find the end of those associated document sections.

dmsnell · 2022-11-04T19:30:11Z

Will this also fix the issue we had with error logs getting polluted?

I'm still not sure how that one came up and am going to do more investigation. It's possible they are related and this issue caused the other, but I'd like to have a good model explaining it before saying one way or the other. My guess is that it was this problem that put the parser into a weird state which triggered the excessive warnings.

That is, the only way I think the warnings we were seeing about array_key_exists needing a string or int as the first argument could be created is if we're telling substr to start copying after the end of a string and running PHP<8. Since this bug is messing up the internal parser's pointer, that seems plausible as a mechanism for the warning.

@aidvu

Follows #45537 When parsing truncated HTML it was brought to our attention that when passing out-of-bounds indices to `strspn()` and `strcspn()` that the behavior is different before and after PHP8. We also realized that when we cleaned up the problems with `substr()` we left some indices without bounds checking and that led to a different flavor of the same problem. When parsing the following HTML we run into warnings when calling `strspn()` and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop while in later versions it simply omits a warning. ```php  <figure class="wp-block-gallery has-nested-images ... ``` In this patch we're adding proper bounds checking wherever we update the internal pointer in the Tag Processor to avoid any further out-of-bounds issues. While this patch fixes the core issue at stake, it's worth performing a more complete audit of the index usage throughout the class and consider internalizing the string methods to avoid version inconsistencies and provide a more robust mechanism for aborting when passing the end of the provided input document. Props to @aidvu for quickly identifying this issue.

@aidvu

Follows #45537 When parsing truncated HTML it was brought to our attention that when passing out-of-bounds indices to `strspn()` and `strcspn()` that the behavior is different before and after PHP8. We also realized that when we cleaned up the problems with `substr()` we left some indices without bounds checking and that led to a different flavor of the same problem. When parsing the following HTML we run into warnings when calling `strspn()` and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop while in later versions it simply omits a warning. ```php  <figure class="wp-block-gallery has-nested-images ... ``` In this patch we're adding proper bounds checking wherever we update the internal pointer in the Tag Processor to avoid any further out-of-bounds issues. While this patch fixes the core issue at stake, it's worth performing a more complete audit of the index usage throughout the class and consider internalizing the string methods to avoid version inconsistencies and provide a more robust mechanism for aborting when passing the end of the provided input document. Props to @aidvu for quickly identifying this issue.

@aidvu

Follows #45537 When parsing truncated HTML it was brought to our attention that when passing out-of-bounds indices to `strspn()` and `strcspn()` that the behavior is different before and after PHP8. We also realized that when we cleaned up the problems with `substr()` we left some indices without bounds checking and that led to a different flavor of the same problem. When parsing the following HTML we run into warnings when calling `strspn()` and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop while in later versions it simply omits a warning. ```php  <figure class="wp-block-gallery has-nested-images ... ``` In this patch we're adding proper bounds checking wherever we update the internal pointer in the Tag Processor to avoid any further out-of-bounds issues. While this patch fixes the core issue at stake, it's worth performing a more complete audit of the index usage throughout the class and consider internalizing the string methods to avoid version inconsistencies and provide a more robust mechanism for aborting when passing the end of the provided input document. Props to @aidvu for quickly identifying this issue.

dmsnell requested review from georgeh and noahtallen November 4, 2022 05:44

dmsnell requested a review from spacedmonkey as a code owner November 4, 2022 05:44

dmsnell added the [Type] Bug An existing feature does not function as intended label Nov 4, 2022

dmsnell added this to the Gutenberg 14.5 milestone Nov 4, 2022

georgeh approved these changes Nov 4, 2022

View reviewed changes

dmsnell force-pushed the fix/tag-processor-infinite-loop branch from cd626a9 to f921a4e Compare November 4, 2022 19:17

dmsnell merged commit f921a4e into trunk Nov 4, 2022

dmsnell deleted the fix/tag-processor-infinite-loop branch November 4, 2022 20:04

dmsnell mentioned this pull request Nov 16, 2022

Tag Processor: Prevent bugs from pre-PHP8 strspn/strcspn behavior #45822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

dmsnell commented Nov 4, 2022 •

edited

Loading

codesandbox bot commented Nov 4, 2022

georgeh left a comment

noahtallen commented Nov 4, 2022

noahtallen commented Nov 4, 2022

dmsnell commented Nov 4, 2022

Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

Conversation

dmsnell commented Nov 4, 2022 • edited Loading

What

Why?

How?

Testing Instructions

codesandbox bot commented Nov 4, 2022

georgeh left a comment

Choose a reason for hiding this comment

noahtallen commented Nov 4, 2022

noahtallen commented Nov 4, 2022

dmsnell commented Nov 4, 2022

dmsnell commented Nov 4, 2022 •

edited

Loading