Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Zinc + Persistent Bazel Worker Processes #12

Closed
wants to merge 2 commits into from

Conversation

ahirreddy
Copy link

This is a WIP PR that integrates Zinc with Bazel's support for persistent worker processes. The main advantage of running with persistent Zinc workers is taking advantage of warm Scala compilers across builds (this includes reusing warm compiler instances across different build targets). I've found significant compile time improvements on large builds (a 3x speed up for our internal build), although I've done no concrete bench marking.

This PR does not enable the incremental compiler. It is unclear if it makes sense to enable incremental re-compilation within Bazel, as I do not believe the results of incremental compilation are guaranteed to be reproducible.

To use the persistent worker use the scala_worker rule and run:
bazel build --strategy=Scala=worker --worker_max_instances=<n> <target>.

You can also add these default parameters to your .bazelrc (which can be global or per repo) so that you don't have to constantly type them:

build --strategy=Scala=worker --worker_max_instances=8

Notes:

  • It uses a modified version of Zinc at the moment, which can read arguments from a file instead of the command line. This was originally to deal with issues where we exceeded the maximum size argument list that could be passed as a command line parameter. It's unclear if this is still necessary since Bazel communicates with the worker process using protobuf over STDIN. The custom version can be found here.
  • I had to use the java_binary rule instead of the scala_binary because the scala rule changes the present working directory before invoking the JVM. The issue here is that Bazel provides input paths and arguments to the worker process as relative paths that only resolve in the initial working directory.
  • To test the changes against our internal repo, I had to add 2.10 support. Minus one hack that I will clean up, it is straight forward to support 2.10 and 2.11 with the same rules.
  • I pulled the Worker implementation from the ExampleWorker in Bazel, and converted it to support Scala.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

progress_message="scala ijar %s" % ctx.label,)


def _compile(ctx, jars, buildijar, usezinc):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this a bit so that it can be reused for both scalac and zinc invocations

@@ -254,7 +348,8 @@ _implicit_deps = {
"_ijar": attr.label(executable=True, default=Label("//tools/defaults:ijar"), single_file=True, allow_files=True),
"_scalac": attr.label(executable=True, default=Label("@scala//:bin/scalac"), single_file=True, allow_files=True),
"_scalalib": attr.label(default=Label("@scala//:lib/scala-library.jar"), single_file=True, allow_files=True),
"_scalaxml": attr.label(default=Label("@scala//:lib/scala-xml_2.11-1.0.4.jar"), single_file=True, allow_files=True),
# "_scalaxml": attr.label(default=Label("@scala//:lib/scala-xml_2.11-1.0.4.jar"), single_file=True, allow_files=True),
"_scalaxml": attr.label(default=Label("@scala//:lib/scala-library.jar"), single_file=True, allow_files=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, was this a hack to work around 2.11 being hardwired in here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I need to find a better way to handle this.

@damienmg
Copy link
Contributor

Note that the author of this PR hasn't signed the CLA. I will meet with our open-source office on Friday and ask them if we need Google CLA for this repo. Until then we cannot accept this PR unless the author sign the CLA.

"main_class": attr.string(),
"exports": attr.label_list(allow_files=False),
# Worker Args
"worker": attr.label(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a private label? Should users ever change this? If private, can we use _worker?

@johnynek
Copy link
Member

related to #8

@johnynek
Copy link
Member

Thanks so much for working on this! Very exciting!

Open issues:

  1. how do we cleanly support scala 2.10 & 2.11 (and soon 2.12) without copying too much code?
  2. is zinc reliable (and faster) in this mode than just using a warm scalac? Could we just use scalac as a worker by calling the main method on that? I think in bazel we definitely don't want to break the invariant that identical code produces identical jars. How can we test that here? Can we make a stand alone test for the scala worker?
  3. Can we have just one mode? Ideally, we have a faster compile, and that is just the default. Maybe users can opt out in some memory constained cases with an option? Could we make such an option a WORKSPACE setting?
  4. Can we get the tests green on travis?

@damienmg
Copy link
Contributor

Btw, /cc @philwo FYI, a PR to use worker for Scala/Zinc :)


for (input <- request.getInputsList().asScala) {
inputs.put(input.getPath(), input.getDigest().toStringUtf8())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: idiomatic scala would be:

request.getInputsList.asScala.foreach { input =>
  inputs.put(input.getPath, input.getDigest.toStringUtf8)
}

@johnynek
Copy link
Member

@damienmg @philwo are there docs about such things as how many worker processes are started? I assume more than one. Maybe as many as there are CPUs?

@johnynek
Copy link
Member

related to #14

@ahirreddy
Copy link
Author

@johnynek Thanks for taking a look! I'll start addressing the comments in the next few days when I have some free time.

@damienmg Will sign the CLA once I get approval on my end.

@johnynek
Copy link
Member

johnynek commented Mar 3, 2016

@ahirreddy great!

I'd be interested to see if we can just jump in here:
https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/Main.scala

and make an invocation. For instance:

@annotation.tailrec
def loop(): Unit = {
  val args = readArgsFromProto
  val comp = new MainClass
  // see: https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/Driver.scala#L37
  comp.process(args)
  if (comp.reporter.hasErrors) reportErrors(comp.reporter) else reportSuccess()
  loop()
}

might do it, and as long as there are no static vars, that should be exactly like running scalac except we get the jit to warm up.

What do you think of that?

@damienmg
Copy link
Contributor

damienmg commented Mar 4, 2016

@ahirreddy Thanks

I have meet with our Open-source office and they confirmed that we need CLA for everything that is contributed to a bazelbuild repository

@philwo
Copy link
Member

philwo commented Mar 7, 2016

@johnynek Currently, Bazel starts up to 4 workers per "key", where a key is a unique combination of command-line to start the worker, environment variables, working directory and Spawn mnemonic. (In other words, if you use both the Javac persistent worker and the Scala persistent workers, there will be up to 4 workers spawned for each.)

You can tune that via the --worker_max_instances flag: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/worker/WorkerOptions.java#L48

@philwo
Copy link
Member

philwo commented Mar 15, 2016

@ahirreddy I'd love to see this in Bazel - is there anything we could do to help you get it ready for merging?

@ahirreddy
Copy link
Author

Hey @philwo I'll try to get to it this weekend! I've been somewhat backed up with other tasks, but I really want to get this merged :)

@ulfjack
Copy link
Contributor

ulfjack commented May 11, 2016

Any updates on this?

@xynny
Copy link

xynny commented May 13, 2016

we are using this internally right now. I work with ahir. There is currently an issue with the caching as well because the zinc worker emits a temporary directory instead of a jar. This breaks caching because it can't generate a digest for a directory as an input. I'm working on this issue right now.

@johnynek
Copy link
Member

I'm pessimistic about the incremental compiler aspect of zinc. The pants folks have had a lot of correctness bugs around it. I think that should definitely be its own separate plugin for people who like to live on the edge.

That said, I think doing a worker process just using the normal compiler should be pretty easy and we can reinstantiate a new compiler on each call, and I think the JIT will give us a pretty good win.

We are using bazel in the mode where CI does not to a full rebuild, so we care a lot about the correctness part and would not want to trade that for a slightly faster compile.

@ittaiz
Copy link
Member

ittaiz commented May 15, 2016

+1 on correctness.
We've had zinc issues to caching which caused some people to lose faith.
On שבת, 14 במאי 2016 at 3:25 P. Oscar Boykin [email protected]
wrote:

I'm pessimistic about the incremental compiler aspect of zinc. The pants
folks have had a lot of correctness bugs around it. I think that should
definitely be its own separate plugin for people who like to live on the
edge.

That said, I think doing a worker process just using the normal compiler
should be pretty easy and we can reinstantiate a new compiler on each call,
and I think the JIT will give us a pretty good win.

We are using bazel in the mode where CI does not to a full rebuild, so we
care a lot about the correctness part and would not want to trade that for
a slightly faster compile.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#12 (comment)

@philwo
Copy link
Member

philwo commented May 17, 2016

+1 on favoring correctness over speed and leaving out the incremental compilation. JIT and no more JVM start-up overhead should already give a pretty nice performance boost in itself.

@ittaiz
Copy link
Member

ittaiz commented May 17, 2016

@xynny @ahirreddy
Any chance of splitting the work to persistent worker with incremental compilation off and then another one in a plugin like @johnynek suggested?

@ahirreddy
Copy link
Author

This PR doesn't actually use the incremental compiler component of Zinc, so we have no issues around correctness or reproducibility. Zinc just happens to provide a nice persistent build server wrapper around the Scala compiler. Non reproducible builds on our end have usually been caused by timestamp issues in jars.

I've actually worked on this alot internally, adding code coverage support and support for Scala 2.10 and 2.11 (another Zinc benefit is that it can host Scala 2.10 and 2.11 compilers simultaneously). I'll try to push these upstream once I have some time.

@bazel-io
Copy link
Member

Can one of the admins verify this patch?

@ittaiz
Copy link
Member

ittaiz commented Aug 30, 2016

Hi @ahirreddy,
Any news?

@philwo
Copy link
Member

philwo commented Sep 5, 2016

I'd also be interested in how this project is going :) It would be an awesome and very welcome contribution to Bazel.

@johnynek
Copy link
Member

closed by #91

@johnynek johnynek closed this Sep 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants