Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enable-render-process-reuse flag #120952

Merged
merged 5 commits into from
Nov 16, 2021
Merged

Conversation

rzhao271
Copy link
Contributor

@rzhao271 rzhao271 commented Apr 9, 2021

Affects #120431

This PR adds an enable-render-process-reuse flag to the runtime arguments in argv.json (one can access that file by running the workbench.action.configureRuntimeArguments command. By default, the runtime argument is false, but one can set it to true in argv.json.

@rzhao271 rzhao271 requested review from deepak1556 and bpasero April 9, 2021 17:40
@rzhao271 rzhao271 self-assigned this Apr 9, 2021
@rzhao271 rzhao271 added this to the April 2021 milestone Apr 9, 2021
@rzhao271 rzhao271 added feature-request Request for new features or functionality workbench-electron Electron-VS Code issues labels Apr 9, 2021
@rzhao271 rzhao271 added the sandbox Running VSCode in a node-free environment label Apr 9, 2021
Copy link
Collaborator

@deepak1556 deepak1556 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse the runtime arguments code path, check enable-browser-code-loading for example and add a new flag enable-render-process-reuse.

@rzhao271
Copy link
Contributor Author

rzhao271 commented Apr 9, 2021

The flag is now enable-render-process-reuse, which takes in either true or false. Its default value is true.
It's set in argv.json. I was having trouble testing it earlier, until I realized this. 😅

@rzhao271 rzhao271 requested a review from deepak1556 April 9, 2021 19:02
src/main.js Outdated Show resolved Hide resolved
@deepak1556
Copy link
Collaborator

deepak1556 commented Apr 9, 2021

Sanity Check before merging,

  • Workbench process ID does not change on reload
  • Swapping workspaces in a window also keeps the same process
  • Ensure shared process and its child process are not restarted
  • Ensure child process from workbench get restarted
    • Extension Host
    • Watcher Service
    • Search Service
  • Smoke test https://github.com/Microsoft/vscode/wiki/Smoke-Test

@rzhao271 rzhao271 force-pushed the rzhao271/process-reuse-flag branch from b788db5 to 89f7299 Compare April 9, 2021 20:08
@rzhao271
Copy link
Contributor Author

rzhao271 commented Apr 9, 2021

Current issues on Windows:

  1. The search service isn't reloading (edit: this is by design, see next comment for details)
  2. The smoke tests don't run

I confirmed these issues do not occur on the main branch.

@rzhao271
Copy link
Contributor Author

rzhao271 commented Apr 9, 2021

It turns out the search service not loading until a search is made is expected behaviour.
Rob points out that extension activation could cause a search to occur, which I confirmed on Insiders: I launched code-insiders with all extensions disabled on an untrusted workspace. Then, I opened the process explorer, and confirmed that searchService wasn't there.

@rzhao271 rzhao271 force-pushed the rzhao271/process-reuse-flag branch from 89f7299 to 7d62d6a Compare April 9, 2021 21:50
@rzhao271
Copy link
Contributor Author

Current progress:

On Mac, sometimes a hang occurs when restarting a workspace. When there are no hangs, the smoke tests pass, too. This is with process reuse enabled on Electron 11. At least there's no more process leaks.

On Windows, Electron 12 needs to be merged in to bring in electron/electron#28175, as described here: #120431 (comment). Considering how it should be merged in relatively soon, I'll check whether it's been merged in tomorrow. If it hasn't, I'll just reapply these changes locally on the Electron 12 branch and run the sanity checks from there.

src/main.js Outdated Show resolved Hide resolved
src/main.js Outdated Show resolved Hide resolved
@bpasero bpasero requested a review from alexdima April 21, 2021 06:33
@bpasero
Copy link
Member

bpasero commented Apr 21, 2021

Adding Alex for the changes in extension host.

@rzhao271 rzhao271 force-pushed the rzhao271/process-reuse-flag branch from 916d63f to 0269c80 Compare April 21, 2021 17:44
@rzhao271
Copy link
Contributor Author

@alexdima I've adjusted the onWillShutdown() and onDidShutdown() methods within localProcessExtensionHost.ts, because when I put a breakpoint within the then clause within onDidShutdown(), the process crashes. @deepak1556 is already looking into this, but considering how it is a long-running task anyway, I have moved it to onWillShutdown().

I'm also wondering why there was a 10 second setTimeout used though? I'm not sure whether that can cause bugs considering how with process reuse, the workbench PID stays the same, but another extension host is launched.

@rzhao271 rzhao271 requested a review from bpasero April 21, 2021 19:07
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzhao271

The extension host process can live after the renderer process terminates. The idea is that the renderer process sends a Terminate message to the extension host. The extension host will then call deactivate on extensions, collect their promises and wait up to 5 seconds for extensions to finish cleaning up and only then exit. The renderer had this very old fallback to kill the extension host process after 10s. If I understand things correctly, the renderer process would be long gone before that timeout even executed.

I think the current proposed change kills the extension host process immediately from the onWillShutdown event which is not correct. I suggest to simply remove the 10s extra careful timeout (just delete original lines 632-634) and leave the rest of the terminate() method as it was.

So even after adjusting this, I still have some concerns about the change with renderers reusing the same pid. In case the renderer would quit unexpectedly (without sending a Terminate message), the extension host would run every 1s a piece of code which checks if the renderer process is still alive. If the renderer process is gone, the extension host will unilaterally begin to exit (just as if the renderer process would have sent it a Terminate message).

The other more complex case is protecting against run-away while(true) extension host processes. We have a native node module called native-watchdog which is not very smart, but does the following: it spawns a thread in C++ and checks every 1s if the renderer process is alive or not. If the renderer process is not alive, the C++ code will brutally exit the extension host process in 7s or 10s, not sure the exact times. This serves as a watchdog for cases where extensions enter by accident while(true) loops. When the user closes the window, within 7-10s the extension host process will exit, even if the JS event loop is kept busy in an endless loop.

I am concerned that reusing the same process for the renderer will break these scenarios, so IMHO we need to find alternative implementations that don't rely on the renderer pid for this.

@rzhao271 rzhao271 requested a review from deepak1556 April 22, 2021 01:11
@@ -664,5 +653,6 @@ export class LocalProcessExtensionHost implements IExtensionHost {
this._extensionHostDebugService.terminateSession(this._environmentService.debugExtensionHost.debugId);
event.join(timeout(100 /* wait a bit for IPC to get delivered */), 'join.extensionDevelopment');
}
this.terminate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this terminate call join the onWillShutdown to ensure the message is delivered before the app goes down? I think terminate itself runs a promise.then but does not signal that to the outside:

It also seems to me that terminate is called from more locations:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, I created a new terminate function that returns a promise, and then joined it in onWillShutdown. I noticed that the reloading took noticeably longer, but I also managed to make it so that the extension host process isn't leaked. However, there's still some hangs that occur, so I need to look more into what's happening there.

@rzhao271
Copy link
Contributor Author

@alexdima I talked with Deepak about the case where the render process quits unexpectedly, or where the watchdog detects that the extension host is unresponsive. We're thinking that the main process can spawn a watchdog process of sorts, and that process can be used to monitor/kill the extension host in both cases. This way, we don't need the 1-second polling from the extension host, and we don't need the native watchdog to be loaded separately. Thoughts?

As for the case where the workbench is being reloaded, I've been trying a few implementations locally, but it seems that on Mac, unless I explicitly kill the extension host process using this._extensionHostProcess.kill(), the extension host always stays as a zombie process. I've tried adding a this._extensionHostProcess.unref() line right after the fork() call, so that the parent renderer process does not have to wait for the extension host to exit before proceeding with the reload, but that results in a zombie: on the next relaunch, the old extension host shows up in the process explorer as just (Code - OSS Helper), and that process doesn't show up in the Activity Monitor. If I do wait for the extension host to exit using an .on('exit', ...) handler, and by joining this with the onWillShutdown event, the reload now takes much longer (up to 5 seconds longer), due to having to wait for the extension host to cleanly exit, but it doesn't show up as a zombie anymore.

@rzhao271 rzhao271 force-pushed the rzhao271/process-reuse-flag branch from 2770847 to e17e944 Compare November 12, 2021 00:50
@rzhao271
Copy link
Contributor Author

rzhao271 commented Nov 12, 2021

I've rebased the PR.

TODO for myself:

  • Confirm whether zombie processes are still created on the Mac.
  • See whether terminate still has to be in the form of a promise, and whether we must still wait on it during _onWillShutdown.

@rzhao271 rzhao271 changed the title Add disable-process-reuse flag Add enable-render-process-reuse flag Nov 12, 2021
@rzhao271
Copy link
Contributor Author

I have confirmed that long-lasting zombie process are not created on macOS. Sometimes, a terminal process turns into a zombie, but it is reaped soon after.
However, on macOS, the main process seems to freeze very often, and its CPU usage goes up to around 100%. Moreover, the terminal often doesn't load with process reuse enabled, so I'm wondering whether there is something going on with the main-to-ptyHost communication path.

@rzhao271 rzhao271 modified the milestones: Backlog, November 2021 Nov 12, 2021
// Give the extension host 10s, after which we will
// try to kill the process and release any resources
setTimeout(() => this._cleanResources(), 10 * 1000);
public terminate(): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes specific to the renderer reuse flag or can they be ported to main independently if they are useful? Asking because now the extension host is not spawned from the workbench anymore so I wonder if they have purpose even now already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes were previously there to slow down onWillShutdown because otherwise,

  1. Adding a breakpoint to that function and stepping through it would result in a crash, because the environment was being destroyed/shut down anyway.
  2. If the resources weren't cleaned, a zombie process would occur on macOS.

Now with Alex's changes, point 2 is irrelevant. I believe point 1 still occurs, but considering how one has to add a breakpoint there to even repro the issue, it's unclear to me whether there is any problems with us leaving that code as-is. I have reverted that code, and now the PR only adds a runtime flag.

@deepak1556
Copy link
Collaborator

I would test process reuse with Electron 15 branch, I will push up the branch tomorrow. Electron 15 has far more fixes than 13 that cleans up the old process code path.

on macOS, the main process seems to freeze very often, and its CPU usage goes up to around 100%

Can you take a profile and see where the slowdown is.

Moreover, the terminal often doesn't load with process reuse enabled, so I'm wondering whether there is something going on with the main-to-ptyHost communication path.

Is it slow rendering or no rendering ? There is a know issue with the revive feature impacted by the shared process host #133964, profile would be good understand what you are seeing.

@rzhao271
Copy link
Contributor Author

rzhao271 commented Nov 15, 2021

I'm unable to repro the slowness issue or the terminal issue now. Considering how I was also seeing issues with the terminal, I think recent changes to the terminal fixed the issue. I'll also force-push so y'all can see which changes from main I pulled in.

@rzhao271 rzhao271 force-pushed the rzhao271/process-reuse-flag branch from e17e944 to 7683ee3 Compare November 15, 2021 20:13
@rzhao271 rzhao271 requested review from alexdima and bpasero November 15, 2021 20:23
Copy link
Collaborator

@deepak1556 deepak1556 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag should be opt-out and not for opt-in, as the default will be permanently set in #137241

Also there will be spec failures related to node-pty, sqlite3 and spdlog. Currently tracking those work in the above PR.

@alexdima alexdima requested review from alexdima and removed request for alexdima November 16, 2021 09:24
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer contains any ext host changes, so 👍 from me.

@rzhao271
Copy link
Contributor Author

Ben reminded me that the flag was not changeable for Electron >=14 anyway.
To me, it makes more sense for the setting to be opt-in for Electron 13 for easier merging + testing. After we're on Electron 15 stable, we can remove the lines having to do with app.allowRendererProcessReuse altogether.

Ref electron/electron#18397

@rzhao271 rzhao271 merged commit db217ba into main Nov 16, 2021
@rzhao271 rzhao271 deleted the rzhao271/process-reuse-flag branch November 16, 2021 20:55
@github-actions github-actions bot locked and limited conversation to collaborators Dec 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality sandbox Running VSCode in a node-free environment workbench-electron Electron-VS Code issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants