-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use orjson for writing compile_commands #118
Conversation
I could not replace the line where |
Hey, @aminya! Thanks for looking to contribute, leaving things even better than you found them. I'm definitely open to big, easy performance wins :) The original json stuff had just been for simplicity, since it's built in, and I was thinking that the runtime was going to be dominated by header finding. Since then we've added the caching you're speeding up, so that's probably no longer true. Just to double check: Those benchmark numbers are for the overall runtime of this tool, right? (Hitting cache on a subsequent run.) If so, wow! What do you think about having bazel auto-install (with pip_parse, I think they call it now)? That way most people will actually get the benefits, and it'll be a little simpler to maintain things, since there'll be just one code path. If you think that's good, too, I'd love it if you'd switch us over to that. Couple other things:
Thanks! |
Thank you! I would be happy to contribute.
Yes, that was a refreshing benchmark after hitting the cache. Since I run this command as part of my build command every time, my whole build process is faster!
That would be great. I am not familiar with Bazel tooling for Python, but I am pretty sure it can be done if we want to. That could allow us to potentially make more optimizations as well.
Yeah, I did not know how I can replace this line of the code. If you know, it would be helpful.
I think pretty printing should be added behind an option as it has some performance penalties. I don't really look at |
Sweet! Love the wins. Thanks again for your willingness to contribute when you spotted an improvement opportunity and for benchmarking as you worked on perf. I'll knock out weaving in the import of rules_python and merge if you'll knock out the other items (above, and expanded below). (Configuring transitive deps in the WORKSPACE & setting up renovate auto update is a bit tricky if you haven't done it before, and I just did it for https://github.com/hedronvision/bazel-make-cc-https-easy, so I think I have comparative advantage there.) In the meantime, just import rules_python from their WORKSPACE snippet to unblock yourself, and then I'd love it if on your end you'd:
^ These are just suggestions. If you notice better ways of doing things or more things that need doing, please just do 'em. And let me know if you need setup help or unblocking of any other kind. Can't wait to get you on the contributors list :) |
Heads that I just landed some changes under you to fix a different issue. |
Hey, Amin! Just checking in on how things are going. If stuck, that's ok--just lmk. |
@cpsauer I tried using the Bazel stuff for pip and python. However, I was not successful. TBH, I haven't used Bazel for any Python projects, and I don't find it straightforward to implement. |
I spent about an hour trying to use rules_python to install orjson but I don't see how it's possible do it entirely in the i.e. where can we put the code that bottoms out in a call to
Maybe this can be worked around with bzlmod? Anyways here is my attempt, feel free to give me suggestions or use it: |
For now just assume you've got rules_python--just put it above hedron_compile_commands_setup in your workspace. (As above, I'm happy to take care of that setup wrappings problem; I've done it before. I'll unfortunately require a second tier of workspace macros (as in the linked example above). But again, happy to take care of the setup wrappings since I've done them before if y'all do the inner implementation. Thanks again! |
OK well it works if I can put arbitrary stuff into the WORKSPACE, see my PR: https://github.com/aminya/bazel-compile-commands-extractor/pull/1/files |
@garymm I merged your PR, however, I get this error when testing: ERROR: Failed to load Starlark extension '@rules_python//python/pip_install:pip_repository.bzl'.
Cycle in the workspace file detected. This indicates that a repository is used prior to being defined.
The following chain of repository dependencies lead to the missing definition.
- @rules_python
This could either mean you have to add the '@rules_python' repository with a statement like `http_archive` in your WORKSPACE file (note that transitive dependencies are not added automatically), or move an existing definition earlier in your WORKSPACE file.
ERROR: Error computing the main repository mapping: cycles detected during computation of main repo mapping To be honest, I find the Bazel setup overly complicated. Maybe we could just add a shell script that installs orjson globally for the user and be done with that. |
I realized I have to add the following to my project's Workspace before I can use BAZEL_SKYLIB_VERSION = "1.4.2"
http_archive(
name = "bazel_skylib",
sha256 = "66ffd9315665bfaafc96b52278f57c7e2dd09f5ede279ea6d39b2be471e7e3aa",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/{0}/bazel-skylib-{0}.tar.gz".format(BAZEL_SKYLIB_VERSION),
"https://github.com/bazelbuild/bazel-skylib/releases/download/{0}/bazel-skylib-{0}.tar.gz".format(BAZEL_SKYLIB_VERSION),
],
)
http_archive(
name = "rules_python",
sha256 = "84aec9e21cc56fbc7f1335035a71c850d1b9b5cc6ff497306f84cced9a769841",
strip_prefix = "rules_python-0.23.1",
url = "https://github.com/bazelbuild/rules_python/releases/download/0.23.1/rules_python-0.23.1.tar.gz",
)
|
WORKSPACE
Outdated
|
||
python_register_toolchains( | ||
name = "python_toolchain", | ||
python_version = "3.11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if this works, then you can assume python 3.11 in refresh.template.py and simplify the script quite a bit. Not suggesting that should be in this PR, probably a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works, but the requirement is adding things to the workspace of the user, which I am not sure if it is desirable:
#118 (comment)
8d99c17
to
7a1f2b8
Compare
Since it was not possible to easily install |
@cpsauer A friendly reminder in case you missed my comment |
orjson is much faster than the standard library's json module (1.9 seconds vs 6.6 seconds for a ~140 MB file). Benchmarks: orjson: 67406 function calls (66880 primitive calls) in 1.919 seconds json: 21663390 function calls (18606459 primitive calls) in 6.588 seconds json no-indent: 21660892 function calls (18603961 primitive calls) in 6.437 seconds
This needs installing orjson!
This reverts commit 8d99c17.
I rebased this branch to be up to date with the recent changes. There's a backup of the old branch here: |
Hey guys! I've merged this in. We're now automatically using hermetic python and installing orjson! Anyone else seeing an error like: |
Thanks @cpsauer! I'll benchmark it again and will let you know. I added beautification back that could slow down the JSON dump. Didn't benchmark after that. |
As a heads, rules_python is causing enough issues that I think it's likely we'll likely need to revert. Was orjson indeed important to you, @aminya, showing up in benchmarking? By default would probably revert that too, since it didn't seem to speed things up in my tests. |
orjson is much faster than the standard library's json module (
1.9
seconds vs6.6
seconds for a ~140 MB file).Benchmarks:
orjson:
50415 function calls (50331 primitive calls) in
1.919 seconds
json:
21663390 function calls (18606459 primitive calls) in
6.588 seconds
json no-indent:
21660892 function calls (18603961 primitive calls) in
6.437 seconds