-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enable-render-process-reuse flag #120952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reuse the runtime arguments code path, check enable-browser-code-loading
for example and add a new flag enable-render-process-reuse
.
The flag is now |
Sanity Check before merging,
|
b788db5
to
89f7299
Compare
Current issues on Windows:
I confirmed these issues do not occur on the main branch. |
It turns out the search service not loading until a search is made is expected behaviour. |
89f7299
to
7d62d6a
Compare
Current progress: On Mac, sometimes a hang occurs when restarting a workspace. When there are no hangs, the smoke tests pass, too. This is with process reuse enabled on Electron 11. At least there's no more process leaks. On Windows, Electron 12 needs to be merged in to bring in electron/electron#28175, as described here: #120431 (comment). Considering how it should be merged in relatively soon, I'll check whether it's been merged in tomorrow. If it hasn't, I'll just reapply these changes locally on the Electron 12 branch and run the sanity checks from there. |
src/vs/workbench/services/search/electron-browser/searchService.ts
Outdated
Show resolved
Hide resolved
Adding Alex for the changes in extension host. |
916d63f
to
0269c80
Compare
@alexdima I've adjusted the I'm also wondering why there was a 10 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extension host process can live after the renderer process terminates. The idea is that the renderer process sends a Terminate message to the extension host. The extension host will then call deactivate
on extensions, collect their promises and wait up to 5 seconds for extensions to finish cleaning up and only then exit. The renderer had this very old fallback to kill the extension host process after 10s. If I understand things correctly, the renderer process would be long gone before that timeout even executed.
I think the current proposed change kills the extension host process immediately from the onWillShutdown
event which is not correct. I suggest to simply remove the 10s extra careful timeout (just delete original lines 632-634) and leave the rest of the terminate()
method as it was.
So even after adjusting this, I still have some concerns about the change with renderers reusing the same pid. In case the renderer would quit unexpectedly (without sending a Terminate message), the extension host would run every 1s a piece of code which checks if the renderer process is still alive. If the renderer process is gone, the extension host will unilaterally begin to exit (just as if the renderer process would have sent it a Terminate message).
The other more complex case is protecting against run-away while(true)
extension host processes. We have a native node module called native-watchdog
which is not very smart, but does the following: it spawns a thread in C++ and checks every 1s if the renderer process is alive or not. If the renderer process is not alive, the C++ code will brutally exit the extension host process in 7s or 10s, not sure the exact times. This serves as a watchdog for cases where extensions enter by accident while(true)
loops. When the user closes the window, within 7-10s the extension host process will exit, even if the JS event loop is kept busy in an endless loop.
I am concerned that reusing the same process for the renderer will break these scenarios, so IMHO we need to find alternative implementations that don't rely on the renderer pid for this.
@@ -664,5 +653,6 @@ export class LocalProcessExtensionHost implements IExtensionHost { | |||
this._extensionHostDebugService.terminateSession(this._environmentService.debugExtensionHost.debugId); | |||
event.join(timeout(100 /* wait a bit for IPC to get delivered */), 'join.extensionDevelopment'); | |||
} | |||
this.terminate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this terminate
call join
the onWillShutdown
to ensure the message is delivered before the app goes down? I think terminate
itself runs a promise.then
but does not signal that to the outside:
vscode/src/vs/workbench/services/extensions/electron-browser/localProcessExtensionHost.ts
Line 624 in f250472
this._messageProtocol.then((protocol) => { |
It also seems to me that terminate
is called from more locations:
vscode/src/vs/workbench/services/extensions/electron-browser/localProcessExtensionHost.ts
Lines 149 to 151 in f250472
public dispose(): void { this.terminate(); } vscode/src/vs/workbench/services/extensions/electron-browser/localProcessExtensionHost.ts
Line 142 in f250472
const globalExitListener = () => this.terminate();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally, I created a new terminate
function that returns a promise, and then joined it in onWillShutdown
. I noticed that the reloading took noticeably longer, but I also managed to make it so that the extension host process isn't leaked. However, there's still some hangs that occur, so I need to look more into what's happening there.
@alexdima I talked with Deepak about the case where the render process quits unexpectedly, or where the watchdog detects that the extension host is unresponsive. We're thinking that the main process can spawn a watchdog process of sorts, and that process can be used to monitor/kill the extension host in both cases. This way, we don't need the 1-second polling from the extension host, and we don't need the native watchdog to be loaded separately. Thoughts? As for the case where the workbench is being reloaded, I've been trying a few implementations locally, but it seems that on Mac, unless I explicitly kill the extension host process using |
2770847
to
e17e944
Compare
I've rebased the PR. TODO for myself:
|
I have confirmed that long-lasting zombie process are not created on macOS. Sometimes, a terminal process turns into a zombie, but it is reaped soon after. |
// Give the extension host 10s, after which we will | ||
// try to kill the process and release any resources | ||
setTimeout(() => this._cleanResources(), 10 * 1000); | ||
public terminate(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes specific to the renderer reuse flag or can they be ported to main
independently if they are useful? Asking because now the extension host is not spawned from the workbench anymore so I wonder if they have purpose even now already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes were previously there to slow down onWillShutdown
because otherwise,
- Adding a breakpoint to that function and stepping through it would result in a crash, because the environment was being destroyed/shut down anyway.
- If the resources weren't cleaned, a zombie process would occur on macOS.
Now with Alex's changes, point 2 is irrelevant. I believe point 1 still occurs, but considering how one has to add a breakpoint there to even repro the issue, it's unclear to me whether there is any problems with us leaving that code as-is. I have reverted that code, and now the PR only adds a runtime flag.
I would test process reuse with Electron 15 branch, I will push up the branch tomorrow. Electron 15 has far more fixes than 13 that cleans up the old process code path.
Can you take a profile and see where the slowdown is.
Is it slow rendering or no rendering ? There is a know issue with the revive feature impacted by the shared process host #133964, profile would be good understand what you are seeing. |
I'm unable to repro the slowness issue or the terminal issue now. Considering how I was also seeing issues with the terminal, I think recent changes to the terminal fixed the issue. I'll also force-push so y'all can see which changes from main I pulled in. |
e17e944
to
7683ee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag should be opt-out and not for opt-in, as the default will be permanently set in #137241
Also there will be spec failures related to node-pty
, sqlite3
and spdlog
. Currently tracking those work in the above PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer contains any ext host changes, so 👍 from me.
Ben reminded me that the flag was not changeable for Electron >=14 anyway. |
Affects #120431
This PR adds an
enable-render-process-reuse
flag to the runtime arguments inargv.json
(one can access that file by running theworkbench.action.configureRuntimeArguments
command. By default, the runtime argument isfalse
, but one can set it totrue
in argv.json.