Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: speed up PR unit-test run time #614

Merged
merged 18 commits into from
Aug 17, 2021

Conversation

blumamir
Copy link
Member

@blumamir blumamir commented Aug 11, 2021

Which problem is this PR solving?

According to my research, running CI unit tests on this repo currently takes ~20m, where a vast amount of the time (~12m) is spent on lerna bootstrap which downloads >1G of node_modules dependencies, and on "Post Cache Dependecies" which upload this >1G tp the cache.

This PR decreases CI run time from ~20m to ~10m when cache action is miss for fetching node_modules from previous ci run.
A quick look at our github action workflows run, shows this is a very common case. Long CIs spends a lot the opentelemetry github organization resources and decrease fast feedback for developer on there PRs (more time to understand tests are failing in CI and fix them). Since current workflow time is linera correlated to the number of packages in the monorepo, CI time is expect to get worst as more packages will be added in the future.

Running the tests only on changed packages will allow us to do more extensive testing in the future (like running test-all-versions), because we will not waste CI time running tests on packages with no changes. According to @dyladan, this is OK with the GC.

Before (cache action miss):
image

After (cache action miss):
image

Drawbacks:

Hoisting

as mentioned in lerna hoisting documentation:

As a result, your packages will be able to import or require any of the dependencies that have been hoisted, even if you forgot to specify that dependency in your package's package.json file.

This means that if package-foo depends on some other package-bar, but forgot to state it in the package.json, tests may still pass as it might get the dependency from the hoisted node_modules via some other package which depends on package-bar.

Since hoisting is only used for CI run, developer should catch this error on his local fork which is not hoisted, but of course this is not guaranteed to catch all errors.

lerna changed option

lerna since filter option will run tests on packages which have code changes from main branch. If package-foo depends on another package from the same repo package-bar, and a PR changes bar but not foo, then tests will only run for bar, although we might want to run them also on foo due to the dependency.

In contrib repo, there are no internal dependencies on other packages from the same monorepo, beside from test-utils, so I don't think this is an issue here, but look bellow for possible solution.

lerna parallel

when running tests with lerna parallel, the output of the test runs is not synchronous, making it a bit less readable. since PRs here usually address a single package, I do not anticipate this to be an issue in this repo

Possible solution

I suggest that we have a daily canary test run that will execute a full test on all the packages in the repo, with no hoisting and with full test-all-versions configuration to catch such issues, without exhausting the CI on each commit to each PR. I can take the task of creating such workflow if we agree to add it.

Short description of the changes

  • Use lerna hoisting to install each dependency only once to the root node_modules of the monorepo, instead of reinstalling it again and again into each of the node_moduless of each package in the repo. Size of the node_modules for each installation of contrib is >1G so the effect is dramatic.

  • Uses the --since filter option and --parallel option for lerna run to execute unit tests only for packages that are changed in this specific PR. We've been using this technique in ext-js repo for long time and it's been working good for us.

@blumamir blumamir requested a review from a team August 11, 2021 17:32
@blumamir
Copy link
Member Author

Actually, I see that running the tests only takes 1.5 minutes compared to 12 minutes of installing the dependencies and 8 minutes on "Post Cache Dependencies", so not sure this optimization is required ATM.

@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #614 (990b7a8) into main (55301c9) will increase coverage by 1.64%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #614      +/-   ##
==========================================
+ Coverage   95.04%   96.68%   +1.64%     
==========================================
  Files         204       13     -191     
  Lines       12003      634   -11369     
  Branches     1137      124    -1013     
==========================================
- Hits        11408      613   -10795     
+ Misses        595       21     -574     
Impacted Files Coverage Δ
...extension-autoinjection/src/contentScript/index.ts
...opentelemetry-instrumentation-net/test/tls.test.ts
...ntelemetry-instrumentation-dns/test/utils/utils.ts
...njection/src/instrumentation/WebInstrumentation.ts
...emetry-instrumentation-knex/src/instrumentation.ts
...tor-grpc-census-binary/src/GrpcCensusPropagator.ts
...pentelemetry-instrumentation-knex/src/constants.ts
...elemetry-instrumentation-pg/src/instrumentation.ts
...ction/src/contentScript/InstrumentationInjector.ts
...etry-instrumentation-router/src/enums/LayerType.ts
... and 181 more

@dyladan
Copy link
Member

dyladan commented Aug 11, 2021

The same filtering flags can be used for the bootstrap stage which should greatly improve dependency installation speed. Any way you can apply that here?

@blumamir
Copy link
Member Author

The same filtering flags can be used for the bootstrap stage which should greatly improve dependency installation speed. Any way you can apply that here?

Sure, good idea. testing if it works.
Hope lerna knows how to handle intra-repo deps (most instrumentations depends on opentelemetry-test-utils which is not likely to change, but still needed for tests to run)

@blumamir blumamir marked this pull request as draft August 11, 2021 23:03
@blumamir
Copy link
Member Author

The same filtering flags can be used for the bootstrap stage which should greatly improve dependency installation speed. Any way you can apply that here?

Sure, good idea. testing if it works.
Hope lerna knows how to handle intra-repo deps (most instrumentations depends on opentelemetry-test-utils which is not likely to change, but still needed for tests to run)

@dyladan I tested it and seem like it does not work.
When I make a change to some instrumentation, let's say instrumentation-ioredis, then @opentelemetry/auto-instrumentations-node is also bootstraped since it depends on instrumentation-ioredis, but then it fails to compile since it has a dependency on all other instrumentations which are not compiled.

@blumamir
Copy link
Member Author

Tested the --hoist option of lerna bootstrap which seemed to improve dramatically the install time. It currently fails on this, since we reference a file directly from node_modules:

  "files": [ "node_modules/zone.js/dist/zone.js.d.ts"],

Any idea how we can remove this dependency and test the hoist install time?

@blumamir
Copy link
Member Author

Tested the --hoist option of lerna bootstrap which seemed to improve dramatically the install time. It currently fails on this, since we reference a file directly from node_modules:

  "files": [ "node_modules/zone.js/dist/zone.js.d.ts"],

Any idea how we can remove this dependency and test the hoist install time?

was able to fix it with --nohoist='zone.js' option to lerna boottstrap. Thanks @rauno56 for helping with this

@blumamir
Copy link
Member Author

The hoist option cut down ci time dramatically, from about ~20 minutes to about ~10 minutes.
It works for all node versions except for node v16, which complains about incorrect peer deps:

npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/webpack
npm ERR!   webpack@"4.46.0" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer webpack@"^5.20.0" from [email protected]
npm ERR! node_modules/html-webpack-plugin
npm ERR!   html-webpack-plugin@"5.3.2" from the root project
npm ERR! 
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.

Will appreciate help with this issue from someone with relevant experience with these dependencies spaghetti

@blumamir
Copy link
Member Author

The hoist option cut down ci time dramatically, from about ~20 minutes to about ~10 minutes.
It works for all node versions except for node v16, which complains about incorrect peer deps:

npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/webpack
npm ERR!   webpack@"4.46.0" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer webpack@"^5.20.0" from [email protected]
npm ERR! node_modules/html-webpack-plugin
npm ERR!   html-webpack-plugin@"5.3.2" from the root project
npm ERR! 
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.

Will appreciate help with this issue from someone with relevant experience with these dependencies spaghetti

Managed to fix this issue with the help of @dyladan and @rauno56 by setting legacy-peer-deps for npm 7.
Now I'm getting a new error in CI about permissions to access package-lock.json on node16/npm7.

Can someone help me take a look and figure out how to solve it?
Thanks

@blumamir
Copy link
Member Author

Managed to fix this issue with the help of @dyladan and @rauno56 by setting legacy-peer-deps for npm 7.
Now I'm getting a new error in CI about permissions to access package-lock.json on node16/npm7.

Can someone help me take a look and figure out how to solve it?
Thanks

Fixed, with the help of @YanivD .

This is ready for review

@blumamir blumamir marked this pull request as ready for review August 16, 2021 08:59
@blumamir blumamir marked this pull request as draft August 16, 2021 10:40
@dyladan
Copy link
Member

dyladan commented Aug 16, 2021

Think you don't need sudo

@dyladan
Copy link
Member

dyladan commented Aug 16, 2021

If the CI isn't running you can ping me on slack to approve the run

@YanivD YanivD force-pushed the ci-test-changed branch 2 times, most recently from 4f6ffa7 to ae2bd3f Compare August 16, 2021 14:10
@blumamir blumamir marked this pull request as ready for review August 17, 2021 07:16
@dyladan
Copy link
Member

dyladan commented Aug 17, 2021

Seems like this is stable now?

@blumamir
Copy link
Member Author

Seems like this is stable now?

Yes. stable, tested, and ready to review

Copy link
Member

@obecny obecny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the very needed and missing piece, thx !

package.json Show resolved Hide resolved
Copy link
Member

@johnbley johnbley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for undertaking this tedious but beneficial work!

@blumamir
Copy link
Member Author

The credit should also go to @YanivD who solved the permissions issue

@dyladan dyladan merged commit d7f458f into open-telemetry:main Aug 17, 2021
@dyladan
Copy link
Member

dyladan commented Aug 17, 2021

Gave him a Co-authored-by when I merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants