Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds multiplex sandbox support and improves argument parsing #52

Merged
merged 8 commits into from
Aug 27, 2024

Conversation

jjudd
Copy link

@jjudd jjudd commented Aug 26, 2024

Primary reviewer here is @jadenPete. I added @gregghz in case they're interested in taking a look.

@jjudd jjudd requested review from gregghz and jadenPete August 26, 2024 19:52
@jjudd jjudd force-pushed the multiplex-sandbox branch 2 times, most recently from 0c9b83d to e3e76db Compare August 26, 2024 20:04
import protocbridge.{ProtocBridge, ProtocRunner}
import scala.jdk.CollectionConverters._
import scalapb.ScalaPbCodeGenerator

object ScalaProtoWorker extends WorkerMain[Unit] {

private class ScalaProtoRequest private (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should this be a case class (this applies to the other configuration classes created in this commit)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so because you get this error message related to Scala 3 migration:

src/main/scala/higherkindness/rules_scala/workers/common/CommonArguments.scala:15: error: access modifiers for `copy` method are copied from the case class constructor under Scala 3 (or with -Xsource-features:case-apply-copy-access)
Scala 3 migration messages are issued as errors under -Xsource:3. Use -Wconf or @nowarn to demote them to warnings or suppress.
Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=scala3-migration, site=higherkindness.rules_scala.workers.common.CommonArguments
case class CommonArguments private (
           ^
1 error

Copy link

@jadenPete jadenPete Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think you need to remove the private constructor modifier too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I want the constructor to be private, though, which is why this is a class and not a case class.

I only want people to be able to construct these objects by passing in a namespace and a working directory, so someone doesn't accidentally construct one incorrectly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think this is fine as-is, then.

def getSandboxFile(workDir: Path, file: File): File = {
workDir.resolve(file.toPath).toFile
}
def getSandboxPaths(workDir: Path, paths: JList[Path]): List[Path] = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need these getSandboxPaths methods? Writing a method for every collection seems a bit unnecessary when the caller could just as easily call getSandboxPath for every element.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally was doing that, but found it to be enough boiler plate that I wrote these as convenience methods.

@@ -14,20 +18,38 @@ import scala.io.Codec

object ScalafmtRunner extends WorkerMain[Unit] {

protected[this] def init(args: Option[Array[String]]): Unit = {}
private[this] class ScalafmtRequest private (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we should avoid using private[this] because it doesn't offer many guarantees over private and will be removed in Scala 3 (this applies to the other uses of private[this] and protected[this] in this commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a performance advantage, however small, in Scala 2 to using private[this] vs private https://users.scala-lang.org/t/what-does-passing-this-after-access-modifier-mean/3838/5

Something that might have a performance impact (though I think the JIT should be pretty good at eliminating this anyway), is that a private[this] val gets compiled to a field only. While private val gets compiled to a field and an accessor method.

Copy link

@jadenPete jadenPete Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know. This was just a nit, so I'm fine with keeping it either way.

@@ -3,6 +3,7 @@ load(
"configure_bootstrap_scala",
"configure_zinc_scala",
"scala_library",
# "bar_toolchain",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Nice catch.

@@ -67,17 +68,15 @@ configure_zinc_scala(

# Scala 3

# Adding this, so we make sure to have a Scala library in the
# IntelliJ libraries, so we can get a Scala SDK on sync.
# Adding this, so we make sure to have a Scala library in the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this comment was duplicated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Fixing.

def pathsForLabel(depLabel: String): Seq[String] = {
val label = workRequest.label
val directDepLabels = workRequest.directDepLabels
val groupLabelToJarPaths = workRequest.groups.map { group =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we inline label and directDepLabels? Also, should groupLabelToJarPaths be a lazy val on DepsRunnerRequest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inlined those two. I did not not change that val to be lazy due to the performance penalty of lazy and the locking that comes with it.

.readAllLines(workRequest.usedDepsFile)
.asScala
.view
// Use the read mapper on the used classpath entries in order to keep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, could you explain what the read mapper does?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read/write mapper is what we use to serialize and deserialize paths in a machine independent way.

If you didn't do mapping you'd end up with something like /home/foo/project/src/scala/bar/Qux.scala, but that's machine dependent. We only care about the src/scala/bar/Qux.scala bit.

The write mappers write things to disk in a machine independent way.

The read mappers know how to read those things from disk and translate them to actual paths on your machine.

Most of this is defined in src/main/scala/higherkindness/rules_scala/workers/common/AnnexMapper.scala

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That clarifies things. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fill me in on why this changes was necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalize does this:

Returns a path that is this path with redundant name elements eliminated.

I realized we weren't normalizing paths, so it would in theory be possible to get a path that points to the same spot, but the string doesn't match up. I wanted to eliminate that potential bit of non-determinism by normalizing the absolute paths.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I wasn't sure if it was necessary as a consequence of introducing multiplexed, sandboxed workers, or if it was just good practice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is unrelated to multiplex sandboxing. Just something I encountered while working on the sandboxing and applied everywhere. It really should have been its own commit. My bad.

val parser = ArgumentParsers.newFor("zinc-worker").addHelp(true).build
parser.addArgument("--persistence_dir", /* deprecated */ "--persistenceDir").metavar("path")
parser.addArgument("--use_persistence").`type`(Arg.booleanType)
parser.addArgument("--extracted_file_cache").metavar("path")
// deprecated
parser.addArgument("--max_errors")
parser.parseArgsOrFail(args.getOrElse(Array.empty))
val namespace = parser.parseArgsOrFail(args.getOrElse(Array.empty))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ArgsUtil.parseArgsOrFailSafe here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not here, no. This is called with the worker is being initialized. It's ok for the worker to die here. If the worker dies here it just fails to start. It's also preferable for it to die as I'm not sure what else to do here.

If that worker dies on a work request, that's bad. Recovery is just telling Bazel that work request failed and then waiting for the next request to come in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

@@ -13,11 +14,14 @@ trait WorkerMain[S] {

protected[this] def init(args: Option[Array[String]]): S

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems rarely used in subclasses of WorkerMain because argument parsing (which calls ArgsUtil.parseArgsOrFailSafe) now requires access to the output stream, which this method doesn't provide. Should we refactor it to provide that and make more places use init in a meaningful way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still useful for when you need to initialize the worker.

init happens once when the worker starts. work happens on every work request.

Agree that it is called in frequently and a good tradeoff may be providing a default implementation of init, but I'm not sure off the top of my head how to accomplish that because of the generics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly concerned with us re-parsing arguments in work in a lot of places. I wonder if removing the args parameter in work would help to enforce that we're doing argument parsing in init.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument parsing happens in work, though. In work, we parse the arguments for each work request.

In init we parse args for the worker as a whole. I think the only place we have a valid use case for this is the Zinc incremental compilation stuff, which I'd like to remove. Beyond that, I don't see a good reason to keep init around.

In another PR I can remove the incremental compilation piece and we should be able to remove init.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was conflating command-line arguments vs. the arguments supplied with each worker task. Carry on.

jjudd added 8 commits August 26, 2024 16:43
…traction

These could/should probably be two separate commits, but the improvement
to argument parsing was inspired by the requirements of multiplex
sandboxing. The work was so intertwined that I'm leaving it as one PR.

Good news is that if we need to move to another parsing framework in the
future, it should be much easier because parsing is no longer spread
throughout the work functions.
This is useful for when workers fail to run and have an error message to
help with debugging. Otherwise you just have empty output.
@jjudd jjudd force-pushed the multiplex-sandbox branch from e3e76db to 80843d2 Compare August 26, 2024 23:52
@jjudd jjudd merged commit fee073f into lucid-master Aug 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants