Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(childprocess): spawned processes are tracked and monitored. #6304

Merged
merged 31 commits into from
Jan 17, 2025

Conversation

Hweinstock
Copy link
Contributor

@Hweinstock Hweinstock commented Jan 3, 2025

Problem

Toolkit/Q slow performance is consistently reported by customers: https://github.com/aws/aws-toolkit-vscode/issues?q=is%3Aopen+is%3Aissue+label%3Aperformance

Toolkit/Q features can spawn processes, included many long-lived processes such as LSP servers. The cost of these processes is hidden and hard to troubleshoot. Any misbehaving process will not be noticed except in a coarse "Toolkit/Q causes vscode to be slow" manner.

Solution

  • Track all running processes in a map-like data structure.
  • use a PollingSet to periodically "monitor" their system usage(%cpu, memory bytes), logging warning messages when it exceeds a threshold.
  • Add developer command to instantiate spawned processes via ChildProcess wrapper to make this easier to test.
image image

Future Work


  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Hweinstock Hweinstock changed the title perf(ChildProcess): spawned processes are tracked and monitored. perf(childprocess): spawned processes are tracked and monitored. Jan 3, 2025
@Hweinstock Hweinstock closed this Jan 3, 2025
@Hweinstock Hweinstock reopened this Jan 3, 2025
@Hweinstock Hweinstock marked this pull request as ready for review January 6, 2025 17:58
@Hweinstock Hweinstock requested a review from a team as a code owner January 6, 2025 17:58
@@ -3,7 +3,7 @@
* SPDX-License-Identifier: Apache-2.0
*/

import { globals } from '..'
import globals from '../extensionGlobals'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoids circular dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably want to add a lint warning.

But for context what is happening is that .. implicitly imports the index.ts module instead of the intended one and that causes the circular dependency issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there every a case where importing from index.ts is preferred? Or is this a pretty straight-forward "don't do this" pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR for lint rule: #6372

@@ -44,9 +44,14 @@ export class PollingSet<T> extends Set<T> {
this.clearTimer()
}
}

// TODO(hkobew): Overwrite the add method instead of adding seperate method. If we add item to set, timer should always start.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan to do as quick follow up, since this involves touching some other files unrelated to process work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

public async getUsage(pid: number): Promise<ProcessStats> {
const getWindowsUsage = () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of const getWindowsUsage... you can do function getWindowsUsage() { ... } and move it to after the logic of getUsage(). I find it cleaner.

@@ -44,6 +45,7 @@ export type DevFunction =
| 'editAuthConnections'
| 'notificationsSend'
| 'forceIdeCrash'
| 'startChildProcess'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I love that we keep enhancing dev-mode.

// isWin() leads to circular dependency.
return process.platform === 'win32' ? getWindowsUsage() : getUnixUsage()
} catch (e) {
ChildProcessTracker.logger.warn(`Failed to get process stats for ${pid}: ${e}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it ever fails we should probably not attempt on future invocations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure what could cause it to fail here. Do you think there is a risk in continually retrying? Maybe potential log spam? Could this be combatted by setting the polling interval longer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure what could cause it to fail here.

Permissions issues usually.

Maybe potential log spam?

Yeah that's at least my first thought. When a system is in a bad state, it can make things worse if logs "cascade" with a deluge of redundant warnings.

Copy link
Contributor

@justinmk3 justinmk3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great starting point. Merging it so it can start to "bake".

Followup ideas:

  • include perf results in telemetry (do we have an existing metric it can be attached to?)
  • dev-mode can show a "report" of process stats
  • we need some way for non-dev-mode users to get a report of the Toolkit "health", for troubleshooting and bug reports. this would be part of that. the current answer is the About command, but that's too limited.

@justinmk3 justinmk3 merged commit d55b8e1 into aws:master Jan 17, 2025
26 checks passed
@Hweinstock
Copy link
Contributor Author

Follow up to this work: #6394

karanA-aws pushed a commit to karanA-aws/aws-toolkit-vscode that referenced this pull request Jan 17, 2025
## Problem
Toolkit/Q slow performance is consistently reported by customers.
Toolkit/Q features can spawn processes, included many long-lived
processes such as LSP servers. The cost of these processes is hidden and
hard to troubleshoot. Any misbehaving process will not be noticed except
in a coarse "Toolkit/Q causes vscode to be slow" manner.

## Solution
- Track all running processes in a map-like data structure. 
- use a `PollingSet` to periodically "monitor" their system usage (%cpu,
memory bytes), logging warning messages when it exceeds a threshold.
- Add developer command to instantiate spawned processes via
`ChildProcess` wrapper to make this easier to test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants