[WIP] Zinc + Persistent Bazel Worker Processes #12

ahirreddy · 2016-02-28T10:24:01Z

This is a WIP PR that integrates Zinc with Bazel's support for persistent worker processes. The main advantage of running with persistent Zinc workers is taking advantage of warm Scala compilers across builds (this includes reusing warm compiler instances across different build targets). I've found significant compile time improvements on large builds (a 3x speed up for our internal build), although I've done no concrete bench marking.

This PR does not enable the incremental compiler. It is unclear if it makes sense to enable incremental re-compilation within Bazel, as I do not believe the results of incremental compilation are guaranteed to be reproducible.

To use the persistent worker use the scala_worker rule and run:
bazel build --strategy=Scala=worker --worker_max_instances=<n> <target>.

You can also add these default parameters to your .bazelrc (which can be global or per repo) so that you don't have to constantly type them:

build --strategy=Scala=worker --worker_max_instances=8

Notes:

It uses a modified version of Zinc at the moment, which can read arguments from a file instead of the command line. This was originally to deal with issues where we exceeded the maximum size argument list that could be passed as a command line parameter. It's unclear if this is still necessary since Bazel communicates with the worker process using protobuf over STDIN. The custom version can be found here.
I had to use the java_binary rule instead of the scala_binary because the scala rule changes the present working directory before invoking the JVM. The issue here is that Bazel provides input paths and arguments to the worker process as relative paths that only resolve in the initial working directory.
To test the changes against our internal repo, I had to add 2.10 support. Minus one hack that I will clean up, it is straight forward to support 2.10 and 2.11 with the same rules.
I pulled the Worker implementation from the ExampleWorker in Bazel, and converted it to support Scala.

googlebot · 2016-02-28T10:24:06Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

ahirreddy · 2016-02-28T10:26:11Z

scala/scala.bzl

+    progress_message="scala ijar %s" % ctx.label,)
+
+
+def _compile(ctx, jars, buildijar, usezinc):


Refactored this a bit so that it can be reused for both scalac and zinc invocations

johnynek · 2016-02-28T22:29:19Z

scala/scala.bzl

@@ -254,7 +348,8 @@ _implicit_deps = {
  "_ijar": attr.label(executable=True, default=Label("//tools/defaults:ijar"), single_file=True, allow_files=True),
  "_scalac": attr.label(executable=True, default=Label("@scala//:bin/scalac"), single_file=True, allow_files=True),
  "_scalalib": attr.label(default=Label("@scala//:lib/scala-library.jar"), single_file=True, allow_files=True),
-  "_scalaxml": attr.label(default=Label("@scala//:lib/scala-xml_2.11-1.0.4.jar"), single_file=True, allow_files=True),
+  # "_scalaxml": attr.label(default=Label("@scala//:lib/scala-xml_2.11-1.0.4.jar"), single_file=True, allow_files=True),
+  "_scalaxml": attr.label(default=Label("@scala//:lib/scala-library.jar"), single_file=True, allow_files=True),


what is this change about?

ahh, was this a hack to work around 2.11 being hardwired in here?

Exactly. I need to find a better way to handle this.

damienmg · 2016-02-28T22:35:12Z

Note that the author of this PR hasn't signed the CLA. I will meet with our open-source office on Friday and ask them if we need Google CLA for this repo. Until then we cannot accept this PR unless the author sign the CLA.

johnynek · 2016-02-28T22:35:23Z

scala/scala.bzl

+      "main_class": attr.string(),
+      "exports": attr.label_list(allow_files=False),
+      # Worker Args
+      "worker": attr.label(


is this a private label? Should users ever change this? If private, can we use _worker?

johnynek · 2016-02-28T22:43:47Z

related to #8

johnynek · 2016-02-28T22:48:59Z

Thanks so much for working on this! Very exciting!

Open issues:

how do we cleanly support scala 2.10 & 2.11 (and soon 2.12) without copying too much code?
is zinc reliable (and faster) in this mode than just using a warm scalac? Could we just use scalac as a worker by calling the main method on that? I think in bazel we definitely don't want to break the invariant that identical code produces identical jars. How can we test that here? Can we make a stand alone test for the scala worker?
Can we have just one mode? Ideally, we have a faster compile, and that is just the default. Maybe users can opt out in some memory constained cases with an option? Could we make such an option a WORKSPACE setting?
Can we get the tests green on travis?

damienmg · 2016-02-28T23:17:27Z

Btw, /cc @philwo FYI, a PR to use worker for Scala/Zinc :)

johnynek · 2016-02-28T23:55:44Z

scala/ScalaWorker.scala

+
+        for (input <- request.getInputsList().asScala) {
+          inputs.put(input.getPath(), input.getDigest().toStringUtf8())
+        }


nit: idiomatic scala would be:

request.getInputsList.asScala.foreach { input => inputs.put(input.getPath, input.getDigest.toStringUtf8) }

johnynek · 2016-02-29T03:30:57Z

@damienmg @philwo are there docs about such things as how many worker processes are started? I assume more than one. Maybe as many as there are CPUs?

johnynek · 2016-02-29T21:17:46Z

related to #14

ahirreddy · 2016-03-01T21:27:08Z

@johnynek Thanks for taking a look! I'll start addressing the comments in the next few days when I have some free time.

@damienmg Will sign the CLA once I get approval on my end.

johnynek · 2016-03-03T02:45:02Z

@ahirreddy great!

I'd be interested to see if we can just jump in here:
https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/Main.scala

and make an invocation. For instance:

@annotation.tailrec
def loop(): Unit = {
  val args = readArgsFromProto
  val comp = new MainClass
  // see: https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/Driver.scala#L37
  comp.process(args)
  if (comp.reporter.hasErrors) reportErrors(comp.reporter) else reportSuccess()
  loop()
}

might do it, and as long as there are no static vars, that should be exactly like running scalac except we get the jit to warm up.

What do you think of that?

damienmg · 2016-03-04T21:55:54Z

@ahirreddy Thanks

I have meet with our Open-source office and they confirmed that we need CLA for everything that is contributed to a bazelbuild repository

philwo · 2016-03-07T22:08:18Z

@johnynek Currently, Bazel starts up to 4 workers per "key", where a key is a unique combination of command-line to start the worker, environment variables, working directory and Spawn mnemonic. (In other words, if you use both the Javac persistent worker and the Scala persistent workers, there will be up to 4 workers spawned for each.)

You can tune that via the --worker_max_instances flag: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/worker/WorkerOptions.java#L48

philwo · 2016-03-15T21:05:56Z

@ahirreddy I'd love to see this in Bazel - is there anything we could do to help you get it ready for merging?

ahirreddy · 2016-03-23T02:53:42Z

Hey @philwo I'll try to get to it this weekend! I've been somewhat backed up with other tasks, but I really want to get this merged :)

ulfjack · 2016-05-11T11:26:11Z

Any updates on this?

xynny · 2016-05-13T23:23:38Z

we are using this internally right now. I work with ahir. There is currently an issue with the caching as well because the zinc worker emits a temporary directory instead of a jar. This breaks caching because it can't generate a digest for a directory as an input. I'm working on this issue right now.

johnynek · 2016-05-14T00:25:42Z

I'm pessimistic about the incremental compiler aspect of zinc. The pants folks have had a lot of correctness bugs around it. I think that should definitely be its own separate plugin for people who like to live on the edge.

That said, I think doing a worker process just using the normal compiler should be pretty easy and we can reinstantiate a new compiler on each call, and I think the JIT will give us a pretty good win.

We are using bazel in the mode where CI does not to a full rebuild, so we care a lot about the correctness part and would not want to trade that for a slightly faster compile.

ittaiz · 2016-05-15T05:31:57Z

+1 on correctness.
We've had zinc issues to caching which caused some people to lose faith.
On שבת, 14 במאי 2016 at 3:25 P. Oscar Boykin [email protected]
wrote:

I'm pessimistic about the incremental compiler aspect of zinc. The pants
folks have had a lot of correctness bugs around it. I think that should
definitely be its own separate plugin for people who like to live on the
edge.

That said, I think doing a worker process just using the normal compiler
should be pretty easy and we can reinstantiate a new compiler on each call,
and I think the JIT will give us a pretty good win.

We are using bazel in the mode where CI does not to a full rebuild, so we
care a lot about the correctness part and would not want to trade that for
a slightly faster compile.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#12 (comment)

philwo · 2016-05-17T11:15:25Z

+1 on favoring correctness over speed and leaving out the incremental compilation. JIT and no more JVM start-up overhead should already give a pretty nice performance boost in itself.

ittaiz · 2016-05-17T11:39:22Z

@xynny @ahirreddy
Any chance of splitting the work to persistent worker with incremental compilation off and then another one in a plugin like @johnynek suggested?

ahirreddy · 2016-07-17T05:53:17Z

This PR doesn't actually use the incremental compiler component of Zinc, so we have no issues around correctness or reproducibility. Zinc just happens to provide a nice persistent build server wrapper around the Scala compiler. Non reproducible builds on our end have usually been caused by timestamp issues in jars.

I've actually worked on this alot internally, adding code coverage support and support for Scala 2.10 and 2.11 (another Zinc benefit is that it can host Scala 2.10 and 2.11 compilers simultaneously). I'll try to push these upstream once I have some time.

bazel-io · 2016-07-17T05:53:19Z

Can one of the admins verify this patch?

ittaiz · 2016-08-30T18:04:26Z

Hi @ahirreddy,
Any news?

philwo · 2016-09-05T09:10:25Z

I'd also be interested in how this project is going :) It would be an awesome and very welcome contribution to Bazel.

johnynek · 2016-09-13T19:06:14Z

closed by #91

refactoring

63294ab

googlebot added the cla: no label Feb 28, 2016

ahirreddy reviewed Feb 28, 2016
View reviewed changes

scala 2.10 support

9ed7f70

ahirreddy force-pushed the zinc branch from 6ca02d0 to 9ed7f70 Compare February 28, 2016 10:35

johnynek reviewed Feb 28, 2016
View reviewed changes

johnynek mentioned this pull request May 19, 2016

Check that jar output is byte-for-byte identical #55

Merged

johnynek closed this Sep 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Zinc + Persistent Bazel Worker Processes #12

[WIP] Zinc + Persistent Bazel Worker Processes #12

ahirreddy commented Feb 28, 2016

googlebot commented Feb 28, 2016

ahirreddy Feb 28, 2016

johnynek Feb 28, 2016

johnynek Feb 29, 2016

ahirreddy Feb 29, 2016

damienmg commented Feb 28, 2016

johnynek Feb 28, 2016

johnynek commented Feb 28, 2016

johnynek commented Feb 28, 2016

damienmg commented Feb 28, 2016

johnynek Feb 28, 2016

johnynek commented Feb 29, 2016

johnynek commented Feb 29, 2016

ahirreddy commented Mar 1, 2016

johnynek commented Mar 3, 2016

damienmg commented Mar 4, 2016

philwo commented Mar 7, 2016

philwo commented Mar 15, 2016

ahirreddy commented Mar 23, 2016

ulfjack commented May 11, 2016

xynny commented May 13, 2016

johnynek commented May 14, 2016

ittaiz commented May 15, 2016

philwo commented May 17, 2016

ittaiz commented May 17, 2016

ahirreddy commented Jul 17, 2016

bazel-io commented Jul 17, 2016

ittaiz commented Aug 30, 2016

philwo commented Sep 5, 2016

johnynek commented Sep 13, 2016

		progress_message="scala ijar %s" % ctx.label,)


		def _compile(ctx, jars, buildijar, usezinc):

[WIP] Zinc + Persistent Bazel Worker Processes #12

[WIP] Zinc + Persistent Bazel Worker Processes #12

Conversation

ahirreddy commented Feb 28, 2016

googlebot commented Feb 28, 2016

ahirreddy Feb 28, 2016

Choose a reason for hiding this comment

johnynek Feb 28, 2016

Choose a reason for hiding this comment

johnynek Feb 29, 2016

Choose a reason for hiding this comment

ahirreddy Feb 29, 2016

Choose a reason for hiding this comment

damienmg commented Feb 28, 2016

johnynek Feb 28, 2016

Choose a reason for hiding this comment

johnynek commented Feb 28, 2016

johnynek commented Feb 28, 2016

damienmg commented Feb 28, 2016

johnynek Feb 28, 2016

Choose a reason for hiding this comment

johnynek commented Feb 29, 2016

johnynek commented Feb 29, 2016

ahirreddy commented Mar 1, 2016

johnynek commented Mar 3, 2016

damienmg commented Mar 4, 2016

philwo commented Mar 7, 2016

philwo commented Mar 15, 2016

ahirreddy commented Mar 23, 2016

ulfjack commented May 11, 2016

xynny commented May 13, 2016

johnynek commented May 14, 2016

ittaiz commented May 15, 2016

philwo commented May 17, 2016

ittaiz commented May 17, 2016

ahirreddy commented Jul 17, 2016

bazel-io commented Jul 17, 2016

ittaiz commented Aug 30, 2016

philwo commented Sep 5, 2016

johnynek commented Sep 13, 2016