Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Switch worker protocol from Protocol Buffers to JSON #107

Closed
wants to merge 2 commits into from

Conversation

arturdryomov
Copy link
Contributor

@arturdryomov arturdryomov commented Feb 7, 2021

$ bazel clean && bazel build //detekt/wrapper:bin
Source Protocol Buffers, seconds JSON, seconds
MacBook Pro (1.3 GHz 4-core CPU, 8 GB RAM) 150 5
MacBook Pro (2.4 GHz 8-core CPU, 32 GB RAM) 68 4
GitHub Actions (Unknown 2-core CPU, 7 GB RAM) 270 17

@Bencodes
Copy link
Contributor

Bencodes commented Feb 7, 2021

That's a pretty significant difference. Lgtm 👍

@Kernald
Copy link
Contributor

Kernald commented Feb 7, 2021

Doesn't --experimental_worker_allow_json_protocol becomes mandatory for clients with this change? It sounds super interesting, but in practice for most clients, the protobuf binary (as well as the Detekt wrapper, to be fair) are only built once and then cached, so the gains won't be that significant in the grand scheme of things. I'd be in the favour of waiting for --experimental_worker_allow_json_protocol to be flipped in Bazel before merging this.

@arturdryomov
Copy link
Contributor Author

@Kernald, yes, I’m concerned about the flag requirement as well. However, not all setups use caching — the most popular might be a CI workflow without integrated caching. I imagine entire Kotlin projects might compile faster than the protobuf binary alone 🦀

@susinmotion
Copy link

There was a concern that enabling JSON workers would negatively impact performance, but thanks to your investigation, I think we have what we need to flip the flag. Let me do that now!

@arturdryomov
Copy link
Contributor Author

@susinmotion, thanks for dropping by! In a couple of days I’ll be able to run the rule with the JSON protocol on an internal monorepo (thousands of targets). If you are interested, I can share results here. As much as I would like to get rid of the flag, a simple compilation of an independent rule doesn’t sound like a useful metric 😜

@susinmotion
Copy link

Yes, please, that'd be great! There's of course a longer conversation here, about the value of adding the functionality apart from performance, what further changes we expect to the protocol, etc., so don't worry! Your metrics aren't the be all and end all of the flag flip.

@arturdryomov
Copy link
Contributor Author

Intermediate results from running on an internal monorepo on MacBook Pro (2.4 GHz 8-core CPU, 32 GB RAM). Cache was disabled, both invocations are basically bazel clean && bazel build //all_detekt_targets.

PB:

(23:30:43) INFO: Found 3113 targets...
(23:33:05) INFO: Elapsed time: 152.428s, Critical Path: 29.01s
(23:33:05) INFO: 3348 processes: 29 internal, 198 darwin-sandbox, 3 local, 3118 worker.
(23:33:07) INFO: Build completed successfully, 3348 total actions

JSON:

(23:39:29) INFO: Found 3113 targets...
(23:41:12) INFO: Elapsed time: 106.590s, Critical Path: 9.37s
(23:41:12) INFO: 3114 processes: 1 internal, 3113 worker.
(23:41:14) INFO: Build completed successfully, 3114 total actions

Basically 152 vs. 106 seconds. I suspect the difference is the time spent on compiling protobuf. Let me know if there is a Bazel command available which will take the protobuf from cache but will ignore Detekt targets cache.

CC @susinmotion

@Kernald
Copy link
Contributor

Kernald commented Feb 9, 2021

I guess one solution would be to manually build the protobuf binary right after your bazel clean invocation.

Another approach would be to use Bazel's profiling feature to get a sense of how long it takes exactly.

@arturdryomov
Copy link
Contributor Author

Thanks as always Marc!

According to the profile pure Detekt invocations take 102.3 seconds with the PB protocol and 104.1 seconds with the JSON protocol. However, these are single run numbers. Ideally it would be better to make multiple runs and calculate a median but not sure if it’s worth it. Also the current JSON parsing might be sub-optimal since it uses a bit of reflection while it can use streaming APIs without reflection.

@arturdryomov
Copy link
Contributor Author

FYI — blocked by bazelbuild/bazel#13599. As part of the discussion there is a proposal to change the format to NDJSON at bazelbuild/bazel#14219.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants