Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] Optimize the execution speed of Rush #5007

Merged
merged 9 commits into from
Dec 12, 2024
Merged

Conversation

L-Qun
Copy link
Contributor

@L-Qun L-Qun commented Nov 18, 2024

Summary

Recently, I noticed that when running Rush commands, Rush itself takes a considerable amount of time, even if I’m only building a single project.

Analyzing repo state... DONE (8.56 seconds)

After reviewing the Rush source code, I discovered that the cause of this issue is the execution of Git commands.

image

However, I suppose that we only need to retrieve the hash of the relevant project rather than all projects in the monorepo. So I made small changes for Rush, which can save over 50% of Rush's own execution time

Analyzing repo state... DONE (3.54 seconds)

How it was tested

Run the command rush build --to @microsoft/rush. You will notice the time has improved:

Before:

Analyzing repo state... DONE (0.32 seconds)

After:

Analyzing repo state... DONE (0.12 seconds)

Meanwhile, does not affect the cache hits of the built packages.

Impacted documentation

None.

@dmichon-msft
Copy link
Contributor

Conceptually this seems fine, but I'd want to do a lot of stress testing and verify the interaction with, e.g. the getAdditionalFilesFromRushProjectConfigurationAsync function. Adding the filter will cause non-project files to no longer be included in the initial hash set, which means they will require an additional round trip to get the hashes for, despite being included in the git index.

The other problem, though, is that the raw response of this call is potentially consumed by plugins that make use of the inputSnapshot object, so if it is suddenly missing large swathes of files that would be unexpected.

I'm reasonably confident that the performance impact of filtering git ls-tree is negligible; in my experience most of the overhead comes from the git status call.

@L-Qun
Copy link
Contributor Author

L-Qun commented Nov 18, 2024

Conceptually this seems fine, but I'd want to do a lot of stress testing and verify the interaction with, e.g. the getAdditionalFilesFromRushProjectConfigurationAsync function. Adding the filter will cause non-project files to no longer be included in the initial hash set, which means they will require an additional round trip to get the hashes for, despite being included in the git index.

The other problem, though, is that the raw response of this call is potentially consumed by plugins that make use of the inputSnapshot object, so if it is suddenly missing large swathes of files that would be unexpected.

I'm reasonably confident that the performance impact of filtering git ls-tree is negligible; in my experience most of the overhead comes from the git status call.

The main time consumption comes from git ls-tree based on my local testing:

statePromise: 5.403s
locallyModifiedPromise: 1.222s

@dmichon-msft
Copy link
Contributor

During my typical testing:

time git ls-tree -z -r --full-name HEAD -- > /dev/null

real    0m0.100s
user    0m0.087s
sys     0m0.013s
time git status -z -u --no-renames --ignore-submodules --no-ahead-behind -- > /
dev/null

real    0m0.648s
user    0m0.238s
sys     0m0.787s

The issue is that the call to git status scans for untracked files with changes, and that operation is expensive. If we definitively know that there are no unstaged changes, we could speed up evaluation by only dumping the files that are staged, but in typical local development we can't guarantee that.

I'd love to be able to reduce the scope, but it would be a breaking change, because we expose the full list of tracked files and their hashes here:

readonly hashes: ReadonlyMap<string, string>;

@L-Qun
Copy link
Contributor Author

L-Qun commented Nov 18, 2024

I'd love to be able to reduce the scope, but it would be a breaking change, because we expose the full list of tracked files and their hashes here:

readonly hashes: ReadonlyMap<string, string>;

Yes, It would be a breaking change. I modified this comments.

@L-Qun
Copy link
Contributor Author

L-Qun commented Nov 22, 2024

Hi @dmichon-msft, I hope you're doing well! Just checking in to see if there’s anything I can clarify or update in this PR to move it forward

* The raw hashes of all tracked files in the repository.
* The raw hashes of the files relevant to the projects we care about are stored.
* (e.g. when running `rush build`, the hashes of all tracked files in the repository are stored)
* (e.g. when running `rush build --only`, only the hashes of files under the specified project are stored)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. Computation of operation hashes depends on the entire tree of their dependencies, whether or not you are currently executing said dependencies. So at minimum we always need the expansion of --to, not just the values passed to --only to be able to determine build cache entry ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I just want to express that we are no longer storing all the hashes. I’ve removed that description.

@@ -295,10 +296,12 @@ export class ProjectChangeAnalyzer {
const lookupByPath: IReadonlyLookupByPath<RushConfigurationProject> =
this._rushConfiguration.getProjectLookupForRoot(rootDirectory);

const filterPath: string[] = Array.from(projectSelection ?? []).map((project) => project.projectFolder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum the choice to perform filtering needs to be behind a flag in experiments.json, because it will break things for consumers with custom plugins, and needs to be a choice whether or not to apply such logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a configuration to determine whether to enable this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmichon-msft, I've addressed the comments and made the updates. When you get a chance, could you please take another look? Thanks!

@L-Qun L-Qun requested a review from dmichon-msft November 23, 2024 10:06
@L-Qun L-Qun changed the title Optimize the execution speed of Rush [rush] Optimize the execution speed of Rush Nov 25, 2024
Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things but otherwise looks good.

@octogonz octogonz merged commit 9f4dfec into microsoft:main Dec 12, 2024
5 checks passed
projectSelection &&
this._rushConfiguration.experimentsConfiguration.configuration.enableSubpathScan
) {
filterPath = Array.from(projectSelection).map(({ projectFolder }) => projectFolder);
Copy link
Contributor

@dmichon-msft dmichon-msft Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a followup PR: this feature will 100% break the build cache unless we update this to Array.from(Selection.expandAllDependencies(projectSelection), ({ projectFolder }) => projectFolder);

File hashes for dependencies are absolutely necessary when calculating build cache entry ids, unless the only selected phases don't depend on upstream projects at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose projectSelection already includes all the projects that need to be built?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the current dependency relationships are as follows:
image
When running rush build --to packageA, the projectSelection will include all related packages (packageA to packageF)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With --to, projectSelection includes all the dependencies; with --only, it does not. This was addressed by #5045 by expanding the project selection when invoking ProjectChangeAnalyzer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.microsoft.com/json-schemas/rush/v5/experiments.schema.json

@octogonz Could you help deploy a new schema endpoint that includes the enableSubpathScan field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

3 participants