[rush] Optimize the execution speed of Rush #5007

L-Qun · 2024-11-18T08:16:35Z

Summary

Recently, I noticed that when running Rush commands, Rush itself takes a considerable amount of time, even if I’m only building a single project.

Analyzing repo state... DONE (8.56 seconds)

After reviewing the Rush source code, I discovered that the cause of this issue is the execution of Git commands.

However, I suppose that we only need to retrieve the hash of the relevant project rather than all projects in the monorepo. So I made small changes for Rush, which can save over 50% of Rush's own execution time

Analyzing repo state... DONE (3.54 seconds)

How it was tested

Run the command rush build --to @microsoft/rush. You will notice the time has improved:

Before:

Analyzing repo state... DONE (0.32 seconds)

After:

Analyzing repo state... DONE (0.12 seconds)

Meanwhile, does not affect the cache hits of the built packages.

Impacted documentation

None.

dmichon-msft · 2024-11-18T19:34:25Z

Conceptually this seems fine, but I'd want to do a lot of stress testing and verify the interaction with, e.g. the getAdditionalFilesFromRushProjectConfigurationAsync function. Adding the filter will cause non-project files to no longer be included in the initial hash set, which means they will require an additional round trip to get the hashes for, despite being included in the git index.

The other problem, though, is that the raw response of this call is potentially consumed by plugins that make use of the inputSnapshot object, so if it is suddenly missing large swathes of files that would be unexpected.

I'm reasonably confident that the performance impact of filtering git ls-tree is negligible; in my experience most of the overhead comes from the git status call.

L-Qun · 2024-11-18T21:32:25Z

Conceptually this seems fine, but I'd want to do a lot of stress testing and verify the interaction with, e.g. the getAdditionalFilesFromRushProjectConfigurationAsync function. Adding the filter will cause non-project files to no longer be included in the initial hash set, which means they will require an additional round trip to get the hashes for, despite being included in the git index.

The other problem, though, is that the raw response of this call is potentially consumed by plugins that make use of the inputSnapshot object, so if it is suddenly missing large swathes of files that would be unexpected.

I'm reasonably confident that the performance impact of filtering git ls-tree is negligible; in my experience most of the overhead comes from the git status call.

The main time consumption comes from git ls-tree based on my local testing:

statePromise: 5.403s
locallyModifiedPromise: 1.222s

dmichon-msft · 2024-11-18T21:46:27Z

During my typical testing:

time git ls-tree -z -r --full-name HEAD -- > /dev/null

real    0m0.100s
user    0m0.087s
sys     0m0.013s

time git status -z -u --no-renames --ignore-submodules --no-ahead-behind -- > /
dev/null

real    0m0.648s
user    0m0.238s
sys     0m0.787s

The issue is that the call to git status scans for untracked files with changes, and that operation is expensive. If we definitively know that there are no unstaged changes, we could speed up evaluation by only dumping the files that are staged, but in typical local development we can't guarantee that.

I'd love to be able to reduce the scope, but it would be a breaking change, because we expose the full list of tracked files and their hashes here:

rushstack/libraries/rush-lib/src/logic/incremental/InputsSnapshot.ts

Line 127 in da48ac3

readonly hashes: ReadonlyMap<string, string>;

L-Qun · 2024-11-18T23:03:06Z

I'd love to be able to reduce the scope, but it would be a breaking change, because we expose the full list of tracked files and their hashes here:

rushstack/libraries/rush-lib/src/logic/incremental/InputsSnapshot.ts

Line 127 in da48ac3

readonly hashes: ReadonlyMap<string, string>;

Yes, It would be a breaking change. I modified this comments.

L-Qun · 2024-11-22T03:05:27Z

Hi @dmichon-msft, I hope you're doing well! Just checking in to see if there’s anything I can clarify or update in this PR to move it forward

dmichon-msft · 2024-11-23T00:18:15Z

libraries/rush-lib/src/logic/incremental/InputsSnapshot.ts

-   * The raw hashes of all tracked files in the repository.
+   * The raw hashes of the files relevant to the projects we care about are stored.
+   * (e.g. when running `rush build`, the hashes of all tracked files in the repository are stored)
+   * (e.g. when running `rush build --only`, only the hashes of files under the specified project are stored)


This won't work. Computation of operation hashes depends on the entire tree of their dependencies, whether or not you are currently executing said dependencies. So at minimum we always need the expansion of --to, not just the values passed to --only to be able to determine build cache entry ids.

I see. I just want to express that we are no longer storing all the hashes. I’ve removed that description.

dmichon-msft · 2024-11-23T00:19:04Z

libraries/rush-lib/src/logic/ProjectChangeAnalyzer.ts

@@ -295,10 +296,12 @@ export class ProjectChangeAnalyzer {
      const lookupByPath: IReadonlyLookupByPath<RushConfigurationProject> =
        this._rushConfiguration.getProjectLookupForRoot(rootDirectory);

+      const filterPath: string[] = Array.from(projectSelection ?? []).map((project) => project.projectFolder);


At minimum the choice to perform filtering needs to be behind a flag in experiments.json, because it will break things for consumers with custom plugins, and needs to be a choice whether or not to apply such logic.

I added a configuration to determine whether to enable this feature.

Hi @dmichon-msft, I've addressed the comments and made the updates. When you get a chance, could you please take another look? Thanks!

libraries/package-deps-hash/src/getRepoState.ts

dmichon-msft

Few minor things but otherwise looks good.

common/changes/@microsoft/rush/main_2024-11-18-08-13.json

common/changes/@rushstack/package-deps-hash/main_2024-11-18-08-13.json

…-13.json Co-authored-by: David Michon <[email protected]>

Co-authored-by: David Michon <[email protected]>

dmichon-msft · 2024-12-13T19:41:02Z

libraries/rush-lib/src/logic/ProjectChangeAnalyzer.ts

+        projectSelection &&
+        this._rushConfiguration.experimentsConfiguration.configuration.enableSubpathScan
+      ) {
+        filterPath = Array.from(projectSelection).map(({ projectFolder }) => projectFolder);


For a followup PR: this feature will 100% break the build cache unless we update this to Array.from(Selection.expandAllDependencies(projectSelection), ({ projectFolder }) => projectFolder);

File hashes for dependencies are absolutely necessary when calculating build cache entry ids, unless the only selected phases don't depend on upstream projects at all.

I suppose projectSelection already includes all the projects that need to be built?

Suppose the current dependency relationships are as follows:

When running rush build --to packageA, the projectSelection will include all related packages (packageA to packageF)?

With --to, projectSelection includes all the dependencies; with --only, it does not. This was addressed by #5045 by expanding the project selection when invoking ProjectChangeAnalyzer.

https://developer.microsoft.com/json-schemas/rush/v5/experiments.schema.json

@octogonz Could you help deploy a new schema endpoint that includes the enableSubpathScan field?

Optimize the execution speed of Rush

2eb0665

L-Qun requested review from iclanton, octogonz, apostolisms, D4N14L and dmichon-msft as code owners November 18, 2024 08:16

update documentation

1c4e825

Update documentation

12c0c46

dmichon-msft reviewed Nov 23, 2024

View reviewed changes

add enableSubpathScan to control if full scan the entire repository

493e2ab

L-Qun requested a review from dmichon-msft November 23, 2024 10:06

L-Qun changed the title ~~Optimize the execution speed of Rush~~ [rush] Optimize the execution speed of Rush Nov 25, 2024

Merge branch 'microsoft:main' into main

238d1f3

dmichon-msft reviewed Dec 11, 2024

View reviewed changes

libraries/package-deps-hash/src/getRepoState.ts Outdated Show resolved Hide resolved

dmichon-msft reviewed Dec 11, 2024

View reviewed changes

libraries/package-deps-hash/src/getRepoState.ts Outdated Show resolved Hide resolved

dmichon-msft approved these changes Dec 11, 2024

View reviewed changes

common/changes/@microsoft/rush/main_2024-11-18-08-13.json Outdated Show resolved Hide resolved

common/changes/@rushstack/package-deps-hash/main_2024-11-18-08-13.json Outdated Show resolved Hide resolved

L-Qun and others added 4 commits December 12, 2024 06:35

Update common/changes/@rushstack/package-deps-hash/main_2024-11-18-08…

75a0ad0

…-13.json Co-authored-by: David Michon <[email protected]>

Update libraries/package-deps-hash/src/getRepoState.ts

72940eb

Co-authored-by: David Michon <[email protected]>

Update libraries/package-deps-hash/src/getRepoState.ts

6236ccd

Co-authored-by: David Michon <[email protected]>

Update common/changes/@microsoft/rush/main_2024-11-18-08-13.json

4c094f8

Co-authored-by: David Michon <[email protected]>

dmichon-msft approved these changes Dec 11, 2024

View reviewed changes

octogonz approved these changes Dec 12, 2024

View reviewed changes

octogonz merged commit 9f4dfec into microsoft:main Dec 12, 2024
5 checks passed

dmichon-msft reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rush] Optimize the execution speed of Rush #5007

[rush] Optimize the execution speed of Rush #5007

L-Qun commented Nov 18, 2024 •

edited

Loading

dmichon-msft commented Nov 18, 2024

L-Qun commented Nov 18, 2024

dmichon-msft commented Nov 18, 2024

L-Qun commented Nov 18, 2024 •

edited

Loading

L-Qun commented Nov 22, 2024

dmichon-msft Nov 23, 2024

L-Qun Nov 23, 2024

dmichon-msft Nov 23, 2024

L-Qun Nov 23, 2024

L-Qun Nov 27, 2024

dmichon-msft left a comment

dmichon-msft Dec 13, 2024 •

edited

Loading

L-Qun Dec 14, 2024

L-Qun Dec 16, 2024

dmichon-msft Dec 16, 2024

L-Qun Dec 27, 2024

[rush] Optimize the execution speed of Rush #5007

[rush] Optimize the execution speed of Rush #5007

Conversation

L-Qun commented Nov 18, 2024 • edited Loading

Summary

How it was tested

Impacted documentation

dmichon-msft commented Nov 18, 2024

L-Qun commented Nov 18, 2024

dmichon-msft commented Nov 18, 2024

L-Qun commented Nov 18, 2024 • edited Loading

L-Qun commented Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmichon-msft left a comment

Choose a reason for hiding this comment

dmichon-msft Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

L-Qun commented Nov 18, 2024 •

edited

Loading

L-Qun commented Nov 18, 2024 •

edited

Loading

dmichon-msft Dec 13, 2024 •

edited

Loading