Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rules_python is very very slow #1624

Closed
peakschris opened this issue Dec 18, 2023 · 4 comments
Closed

rules_python is very very slow #1624

peakschris opened this issue Dec 18, 2023 · 4 comments

Comments

@peakschris
Copy link

peakschris commented Dec 18, 2023

🐞 bug report

Affected Rule

py_binary

Is this a regression?

No

Description

There are significant performance issues with the python tools in pkg_zip in a bazel environment (on Windows). We were finding when bazel was simultaneously packaging many zips, each one could take 45s instead of 2s expected. A single zip would take 20s, but multiple simultaneous ones slow down to 45s. We discovered that this is because Bazel's hermetic python toolchain (rules_python) uncompresses many files to prepare for every single python invocation. More discussion here: https://bazelbuild.slack.com/archives/CA306CEV6/p1701253691249489

See corresponding issue in rules_pkg: bazelbuild/rules_pkg#795. The workaround there is to rewrite their tooling in a different language.

But the correct fix would be to resolve the problem in rules_python. I wonder why the large set of files cannot be prepared one single time per workspace?

🔬 Minimal Reproduction

Compare build_zip.py performance with go version here: https://github.com/peakschris/build_zip_go. Python version takes 45s when running in parallel. Go version is 2s.

🔥 Exception or Error





🌍 Your Environment

Operating System:

  
Windows 10
  

Output of bazel version:

  
7.0.0
  

Rules_python version:

  
0.27.1
  

Anything else relevant?

@aignas
Copy link
Collaborator

aignas commented Dec 21, 2023

I wonder if this is related to #311.

@martis42
Copy link
Contributor

@peakschris are your numbers from use cases with a hermetic Python toolchain from rules_python? If yes, could you try repeating your measurements again with a project using -nolegacy_external_runfiles?

While it is based on Linux instead of windows I did an analysis on why rules_python with a hermetic toolchain has such a large overhead and above option helps. See https://github.com/martis42/show_hermetic_python_overhead

@peakschris
Copy link
Author

@martis42 yes, I'm using hermetic python toolchain from rules_python.

Thanks for the idea. I have tried setting --nolegacy_external_runfiles and no difference. I also tried unsetting --enable_runfiles and no difference. With these two changes:

PackageZip running a go executable: 227ms
PackageZip running a python script via rules_python/hermetic toolchain: 16s24ms

This is just a single zip, when I attempt to run 10s of zip operations in parallel these python measurements get 2-3x worse.

Name
PackageZip
Category
Local execution process wall time
Start time
00:00:18.291840000
Duration
16s 24ms 725us
Thread
skyframe-evaluator-execution-9 [513]
Process
1
Name
PackageZip
Category
Local execution process wall time
Start time
00:00:19.024805000
Duration
227ms 19us
Thread
skyframe-evaluator-execution-2 [511]
Process
1

@rickeylev
Copy link
Collaborator

I'm going to close this in favor of #1653

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants