-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag Processor: Prevent bugs from pre-PHP8 strspn/strcspn behavior #45822
Conversation
Open in CodeSandbox Web Editor | VS Code | VS Code Insiders |
da7fb86
to
08d8a50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested using the script provided and the changes here, and it did not enter a deadlock. LGTM!
thank you @fullofcaffeine! I had to push a small fix for breakage on |
08d8a50
to
9b27586
Compare
Follows #45537 When parsing truncated HTML it was brought to our attention that when passing out-of-bounds indices to `strspn()` and `strcspn()` that the behavior is different before and after PHP8. We also realized that when we cleaned up the problems with `substr()` we left some indices without bounds checking and that led to a different flavor of the same problem. When parsing the following HTML we run into warnings when calling `strspn()` and `strcspn()`. For pre-PHP8 versions this also leads to an infinite loop while in later versions it simply omits a warning. ```php <!-- wp:gallery {"linkTo":"none"} --> <figure class="wp-block-gallery has-nested-images ... ``` In this patch we're adding proper bounds checking wherever we update the internal pointer in the Tag Processor to avoid any further out-of-bounds issues. While this patch fixes the core issue at stake, it's worth performing a more complete audit of the index usage throughout the class and consider internalizing the string methods to avoid version inconsistencies and provide a more robust mechanism for aborting when passing the end of the provided input document. Props to @aidvu for quickly identifying this issue.
9b27586
to
5f53eaf
Compare
@@ -397,6 +397,10 @@ public function next_tag( $query = null ) { | |||
$already_found = 0; | |||
|
|||
do { | |||
if ( $this->parsed_bytes >= strlen( $this->html ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive by review but I just noticed that strlen( $this->html )
is being called throughout this class. Would this be optimized by storing the length in a $html_length
member variable in the constructor and the get_updated_html
method, and then using it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the question @westonruter
the answer I believe is no because strlen
isn't like a normal PHP function call, it's a PHP op-code and the execution is a direct lookup on the internal string object within the PHP runtime. maybe it saves one indirect memory access, but if that's true I'm not sure it'd be measurable (whereas a function call is often measurable!)
I haven't confirmed this but the docs inside PHP suggest this might even be special-cased and there's truly no difference in the generated CPU instructions. the opcode output is different, but not in a way that sheds light on the CPU instructions (other than that it's not a function call with their characteristic overhead).
so we have a lint rule floating around I think that discourages this style but I believe it's based on a false premise.
What?
Prevent infinite loop in
WP_HTML_Tag_Processor
when parsing truncated HTML.Why?
Follows #45537
Replaces #45803
When parsing truncated HTML it was brought to our attention that when passing out-of-bounds indices to
strspn()
andstrcspn()
that the behavior is different before and after PHP8. We also realized that when we cleaned up the problems withsubstr()
we left some indices without bounds checking and that led to a different flavor of the same problem.When parsing the following HTML we run into warnings when calling
strspn()
andstrcspn()
. For pre-PHP8 versions this also leads to an infinite loop while in later versions it simply omits a warning.In this patch we're adding proper bounds checking wherever we update the internal pointer in the Tag Processor to avoid any further out-of-bounds issues.
While this patch fixes the core issue at stake, it's worth performing a more complete audit of the index usage throughout the class and consider internalizing the string methods to avoid version inconsistencies and provide a more robust mechanism for aborting when passing the end of the provided input document.
Props to @aidvu for quickly identifying this issue.
How?
Inserts bounds-checking wherever we read unchecked string indices.
Testing Instructions
You can use the following script to confirm the behavior in trunk and confirm the fix. Place the script in the root directory of your Gutenberg repo.
In
trunk
on PHP<8 this will create an infinite loop while in the branch it will terminate.In
trunk
this should issue aPHP Warning: Uninitialized string offset 92
warning while in this branch there should be no warning.