Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It takes more than 20 seconds to paste not-so-large HTML document. #41826

Closed
kohheepeace opened this issue Jun 21, 2022 · 6 comments · Fixed by #41907
Closed

It takes more than 20 seconds to paste not-so-large HTML document. #41826

kohheepeace opened this issue Jun 21, 2022 · 6 comments · Fixed by #41907
Assignees
Labels
[Feature] Paste [Type] Performance Related to performance efforts [Type] Regression Related to a regression in the latest release

Comments

@kohheepeace
Copy link

kohheepeace commented Jun 21, 2022

Description

Copy-pasting a not-so-large HTML document takes more than 20 seconds.

A same issue has already been reported, but I have created a new issue to isolate the problem.

Step-by-step reproduction instructions

  1. Create new wordpress site by Flywheel Local (wordpress version is 6.0)
  2. Install gutenberg plugin and activate it. (gutenberg version is 13.4.0)
  3. Goto https://docusaurus.io/docs and copy the document
  4. Create new post and paste the copied HTML document

Screenshots, screen recording, code snippet

The following gif is when copy-and-pasting https://docusaurus.io/docs content.

2022-06-21 09 54 06

Environment info

  • Wordpress version: 6.0
  • Gutenberg version: 13.4.0

Please confirm that you have searched existing issues in the repo.

Yes

Please confirm that you have tested with all plugins deactivated except Gutenberg.

Yes

@annezazu annezazu added [Feature] Paste [Type] Performance Related to performance efforts labels Jun 21, 2022
@annezazu
Copy link
Contributor

@aristath wonder if you have any insights here based on some of the performance work you've done!

@annezazu
Copy link
Contributor

I just replicated this on the Make Network (Make Core specifically). I'll try to flag this up in #core-editor to see if anyone can take a look.

@ntsekouras
Copy link
Contributor

The problem is in this line/function(removeWindowsFragments). You can test by commenting it out. This function was introduced here and passing the html string by two consecutive replace functions is quite expensive.

Also it's a great time to write a test for this.

In the snippet provided in the PR do we need to also remove the html and body tags? Does someone know? --cc @getdave

<html>
 <body>
   <!--StartFragment-->
   Copied text here
   <!--EndFragment-->
 </body>
</html>

IMO if we don't find a workaround quickly enough, it would be good to revert that PR, as this affects paste in every OS and is taking a really long time..

@mcsf
Copy link
Contributor

mcsf commented Jun 23, 2022

The problem is in this line/function(removeWindowsFragments). You can test by commenting it out.

Nice sleuthing! 🌟

This function was introduced here and passing the html string by two consecutive replace functions is quite expensive.

This surprised me, so I looked at the function, thinking that replace by itself shouldn't be that expensive. What follows are just my immediate reactions, and I haven't confirmed any of this with code. This is the body of the function:

const startReg = /.*<!--StartFragment-->/s;
const endReg = /<!--EndFragment-->.*/s;
return html.replace( startReg, '' ).replace( endReg, '' );

What stands out here are the .* patterns combined with the s flag of the regular expressions. I can expect this to cause the RegExp engine to start eating loads of memory as it consumes the matches, and maybe even to backtrack if there are multiple of these (Start|End)Fragment marks.

Since the goal of the function is to remove the marks as well as any preceding/following whitespace, I would rewrite it as:

const startStr = '<!--StartFragment-->';
const startIdx = html.indexOf( startStr );
if ( startIdx > -1 ) html = html.substring( startIdx + startStr.length );

// Possible optimisation: only run this second block if there was a match for `startStr`.
const endStr = '<!--EndFragment-->';
const endIdx = html.indexOf( endStr );
if ( endIdx > -1 ) html = html.substring( 0, endIdx );

@mtias
Copy link
Member

mtias commented Jun 23, 2022

Agreed on reverting if there is no quick fix. Also, the suggested test looks like it's just for pasting on Windows, but we should have performance tests for pasting larger chunks of content in general given this regression slipped through.

@Mamaduka
Copy link
Member

I think this is a good candidate for 6.0.1.

cc @adamziel

@mtias mtias added the [Type] Regression Related to a regression in the latest release label Jun 23, 2022
@github-actions github-actions bot added the [Status] In Progress Tracking issues with work in progress label Jun 23, 2022
@adamziel adamziel moved this from Triage to Todo in WordPress 6.0.x Editor Tasks Jun 23, 2022
mcsf added a commit that referenced this issue Jun 23, 2022
Fixes #41826.

removeWindowsFragments was running two String#replace with regular
expressions made computationally expensive due to the use of `.*` and
the `s / dotAll` flag, resulting in severe performance degradations when
handling larger strings of HTML.

The solution is to manually trim the strings via a combination of
String#indexOf and String#substring.
Repository owner moved this from Todo to Done in WordPress 6.0.x Editor Tasks Jun 23, 2022
mcsf added a commit that referenced this issue Jun 23, 2022
…1907)

* Pasting: Fix performance regression due to removeWindowsFragments

Fixes #41826.

removeWindowsFragments was running two String#replace with regular
expressions made computationally expensive due to the use of `.*` and
the `s / dotAll` flag, resulting in severe performance degradations when
handling larger strings of HTML.

The solution is to manually trim the strings via a combination of
String#indexOf and String#substring.

* removeWindowsFragments: bail early if no StartFragment found
adamziel pushed a commit that referenced this issue Jun 30, 2022
…1907)

* Pasting: Fix performance regression due to removeWindowsFragments

Fixes #41826.

removeWindowsFragments was running two String#replace with regular
expressions made computationally expensive due to the use of `.*` and
the `s / dotAll` flag, resulting in severe performance degradations when
handling larger strings of HTML.

The solution is to manually trim the strings via a combination of
String#indexOf and String#substring.

* removeWindowsFragments: bail early if no StartFragment found
@priethor priethor removed the [Status] In Progress Tracking issues with work in progress label May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Paste [Type] Performance Related to performance efforts [Type] Regression Related to a regression in the latest release
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants