Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Prevent infinite loop in Tag Processor in certain truncated documents #45537

Merged
merged 1 commit into from
Nov 4, 2022

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Nov 4, 2022

What

Previously the Tag Processor class has performed unchecked arithmetic on the result of strpos() when looking to close out HTML comments and other special sections (CDATA, doctype-declaration). This led to a situation where by means of type-coercion we added an integer value to a false returned by the string function, setting the cursor in the tag processor to a low index in the document, and creating an infinite loop condition.

In this patch we're checking the results of calling strpos() in those places to avoid the type error and abort from the processor if we fail to find the end of those associated document sections.

Why?

We shouldn't trigger infinite loops 🙃

How?

Checking for proper types and error-return-values before assuming things worked the way we expect.

Testing Instructions

The following input should trigger the infinite loop in trunk

\n<div class=\"wp-block-group\"><!-- wp:quote ...

Note that the ... is part of the input that led to identifying this issue. This snippet was created when block-unaware code truncated post_content and cut inside the block comment delimiter, making it look like a normal HTML comment.

For a quick test, navigate to the Gutenberg directory and run the following script:

<?php

require_once 'lib/experimental/html/index.php';

$p = new WP_HTML_Tag_Processor( "\n<div class=\"wp-block-group\"><!-- wp:quote ..." );
$p->next_tag();
$p->next_tag();

In trunk this should hang, which you can confirm by instrumenting parse_next_tag() just inside the while ( true ) loop. In this branch the code should immediately complete, returning false from the second call to $p->next_tag().

The unit tests should also continue passing.

cc: @griffbrad

@codesandbox
Copy link

codesandbox bot commented Nov 4, 2022

CodeSandbox logoCodeSandbox logo  Open in CodeSandbox Web Editor | VS Code | VS Code Insiders

@dmsnell dmsnell added the [Type] Bug An existing feature does not function as intended label Nov 4, 2022
@dmsnell dmsnell added this to the Gutenberg 14.5 milestone Nov 4, 2022
Copy link
Contributor

@georgeh georgeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for getting this out so quickly. Should we flag this for a point release of Gutenberg too?

@noahtallen
Copy link
Member

Should we flag this for a point release of Gutenberg too?

I think it was only introduced in 14.5.0 RC (which is where we found it), so I don't think we need a point release of the main plugin

@noahtallen
Copy link
Member

Will this also fix the issue we had with error logs getting polluted?

…ments.

Previously the Tag Processor class has performed unchecked arithmetic on
the result of `strpos()` when looking to close out HTML comments and other
special sections (CDATA, doctype-declaration). This led to a situation
where by means of type-coercion we added an integer value to a `false`
returned by the string function, setting the cursor in the tag processor
to a low index in the document, and creating an infinite loop condition.

In this patch we're checking the results of calling `strpos()` in those
places to avoid the type error and abort from the processor if we fail
to find the end of those associated document sections.
@dmsnell dmsnell force-pushed the fix/tag-processor-infinite-loop branch from cd626a9 to f921a4e Compare November 4, 2022 19:17
@dmsnell
Copy link
Member Author

dmsnell commented Nov 4, 2022

Will this also fix the issue we had with error logs getting polluted?

I'm still not sure how that one came up and am going to do more investigation. It's possible they are related and this issue caused the other, but I'd like to have a good model explaining it before saying one way or the other. My guess is that it was this problem that put the parser into a weird state which triggered the excessive warnings.

That is, the only way I think the warnings we were seeing about array_key_exists needing a string or int as the first argument could be created is if we're telling substr to start copying after the end of a string and running PHP<8. Since this bug is messing up the internal parser's pointer, that seems plausible as a mechanism for the warning.

@dmsnell dmsnell merged commit f921a4e into trunk Nov 4, 2022
@dmsnell dmsnell deleted the fix/tag-processor-infinite-loop branch November 4, 2022 20:04
dmsnell added a commit that referenced this pull request Nov 16, 2022
Follows #45537

When parsing truncated HTML it was brought to our attention that when
passing  out-of-bounds indices to `strspn()` and `strcspn()` that the
behavior is different before and after PHP8. We also realized that when
we cleaned up the problems with `substr()` we left some indices without
bounds checking and that led to a different flavor of the same problem.

When parsing the following HTML we run into warnings when calling `strspn()`
and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop
while in later versions it simply omits a warning.

```php
<!-- wp:gallery {"linkTo":"none"} -->
<figure class="wp-block-gallery has-nested-images ...
```

In this patch we're adding proper bounds checking wherever we update the
internal pointer in the Tag Processor to avoid any further out-of-bounds
issues.

While this patch fixes the core issue at stake, it's worth performing a
more complete audit of the index usage throughout the class and consider
internalizing the string methods to avoid version inconsistencies and
provide a more robust mechanism for aborting when passing the end of the
provided input document.

Props to @aidvu for quickly identifying this issue.
fullofcaffeine pushed a commit that referenced this pull request Nov 16, 2022
Follows #45537

When parsing truncated HTML it was brought to our attention that when
passing  out-of-bounds indices to `strspn()` and `strcspn()` that the
behavior is different before and after PHP8. We also realized that when
we cleaned up the problems with `substr()` we left some indices without
bounds checking and that led to a different flavor of the same problem.

When parsing the following HTML we run into warnings when calling `strspn()`
and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop
while in later versions it simply omits a warning.

```php
<!-- wp:gallery {"linkTo":"none"} -->
<figure class="wp-block-gallery has-nested-images ...
```

In this patch we're adding proper bounds checking wherever we update the
internal pointer in the Tag Processor to avoid any further out-of-bounds
issues.

While this patch fixes the core issue at stake, it's worth performing a
more complete audit of the index usage throughout the class and consider
internalizing the string methods to avoid version inconsistencies and
provide a more robust mechanism for aborting when passing the end of the
provided input document.

Props to @aidvu for quickly identifying this issue.
fullofcaffeine pushed a commit that referenced this pull request Nov 25, 2022
Follows #45537

When parsing truncated HTML it was brought to our attention that when
passing  out-of-bounds indices to `strspn()` and `strcspn()` that the
behavior is different before and after PHP8. We also realized that when
we cleaned up the problems with `substr()` we left some indices without
bounds checking and that led to a different flavor of the same problem.

When parsing the following HTML we run into warnings when calling `strspn()`
and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop
while in later versions it simply omits a warning.

```php
<!-- wp:gallery {"linkTo":"none"} -->
<figure class="wp-block-gallery has-nested-images ...
```

In this patch we're adding proper bounds checking wherever we update the
internal pointer in the Tag Processor to avoid any further out-of-bounds
issues.

While this patch fixes the core issue at stake, it's worth performing a
more complete audit of the index usage throughout the class and consider
internalizing the string methods to avoid version inconsistencies and
provide a more robust mechanism for aborting when passing the end of the
provided input document.

Props to @aidvu for quickly identifying this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants