Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmarks to use Scala 2.12 or 2.13 #242

Merged
merged 125 commits into from
Apr 27, 2021
Merged

Conversation

lbulej
Copy link
Member

@lbulej lbulej commented Apr 20, 2021

While this branch started with a simple idea of updating Scala in various benchmarks to recent versions, it turned out to be a much more involved endeavor, resulting in a lot of fallout that needed handling. To make benchmarks compatible with newer Scala, their dependencies needed to be updated, sometimes substantially (e.g., Spark, Neo4J, Dotty). However, when the dust settles, the suite should be more compatible with modern JVMs. There are a few things that need still need to be done before release, but these should come through much smaller PRs.

Key changes

  • Suite uses Scala versions 2.12.13 and 2.13.5 only, Scala 2.11 is no longer used
  • Spark benchmarks now use Spark 3.0.1, Spark 2.0.0 is no longer used
  • Benchmarks can be in more than one benchmark group
  • Benchmarks can specify requirements on JVM versions
  • Harness avoids executing benchmarks on incompatible JVM
  • Harness provides better control over scratch directories
  • Build system uses SBT 1.4.9 (and some updated plugins)

Follow-up changes

  • Replace the contents of sources.zip in dotty benchmark with scalap sources
  • Clean-up scratch file and directory handling in all benchmarks
  • Capture requirements on JVM versions for the rest of the benchmarks
  • Update README.md generator to output benchmark JVM version requirements
  • Handle Spark parameters consistently in Spark benchmarks

Used Scala versions

The following benchmarks now use Scala 2.12:

  • reactors
  • apache-spark group (als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank)
  • neo4j-analytics
  • scala-stm group (philosophers, scala-stm)
  • twitter-finagle group (finagle-chirper, finagle-http)

The following benchmarks use Scala 2.13:

  • akka-uct
  • dotty
  • scala-doku
  • scala-kmeans

The following are Java benchmarks with wrappers using Scala 2.13:

  • db-shootout
  • jdk-concurrent group (fj-kmeans, future-genetic)
  • jdk-streams group (mnemonics, par-mnemonics, scrabble)
  • rx-scrabble

Benchmark changes

  • Benchmark can be in more than one benchmark group. This can be achieved by providing the benchmark class with multiple @Group annotations.
  • Benchmarks can specify requirements on JVM versions. There are two new annotations for this purpose: the @JvmRequired annotation, which allows setting the minimum required JVM version a benchmark needs to run (inclusive, defaults to "1.8" if unset), and the @JvmSupported annotation, which allows setting the maximum supported JVM version a benchmark runs on (also inclusive, unset by default).

The following provides a more detailed summary of changes in individual benchmarks.

actors/akka-uct (Scala 2.13)

  • Updated Scala version (2.11.8 -> 2.13.5)
  • Split off into separate subproject to make it independent of reactors
  • Updated com.typesafe.actor.akka-actor dependency (2.3.11 -> 2.6.12)

actors/reactors (Scala 2.12)

apache-spark/* (Scala 2.12)

  • Updated Scala version (2.11.8 -> 2.12.13)
    • Spark 3.0 does not support Scala 2.13
  • Updated org.apache.spark.spark-core dependency (2.0.0 -> 3.0.1)
  • Updated org.apache.spark.spark-sql dependency (2.0.0 -> 3.0.1)
  • Updated org.apache.spark.spark-mllib dependency (2.0.0 -> 3.0.1)

database/db-shootout (Scala 2.13)

  • Scala wrapper over Java benchmark
  • Updated Scala version (2.11.8 -> 2.13.5)

dummy/* (Java only)

  • Internal benchmarks for harness testing
  • Made into pure Java subproject

jdk-concurrent/* (Scala 2.13)

  • Scala wrapper over Java benchmarks
  • Updated Scala version (2.12.8 -> 2.13.5)

jdk-streams/* (Scala 2.13)

  • Scala wrapper over Java benchmarks
  • Updated Scala version (2.12.8 -> 2.13.5)

neo4j/neo4j-analytics (Scala 2.12)

  • Updated Scala version (2.12.8 -> 2.12.13)
  • Benchmark ported to Neo4J 4.x API
  • Cleaned up scratch file handling
  • Updated net.liftweb.lift-json dependency (3.2.0 -> 3.4.3)
  • Updated org.neo4j.neo4j dependency (3.5.12 -> 4.2.4)
    • Neo4J 4.2 does not support Scala 2.13
  • Removed direct commons-io.commons-io dependency

rx/rx-scrabble (Scala 2.13)

  • Scala wrapper over Java benchmark
  • Updated Scala version (2.11.8 -> 2.13.5)

scala-dotty/dotty (Scala 2.13)

  • Updated Scala version (2.12.8 -> 2.13.5)
  • Updated contents of sources.zip to make it compatible with updated Dotty
    • The plan is to replace the contents with scalap sources
  • Added org.scala-lang.modules.scala-collection-compat dependency (2.4.2)
  • Added org.scala-lang.scala3-compiler_3.0.0-RC1 dependency (3.0.0-RC1)
  • Removed direct ch.epfl.lamp.dotty-compiler_0.12 dependency
  • Removed direct commons-io.commons-io dependency

scala-sat/scala-doku (Scala 2.13)

scala-stdlib/scala-kmeans (Scala 2.13)

  • Updated Scala version (2.12.8 -> 2.13.5)

scala-stm/* (Scala 2.12)

  • Updated Scala version (2.12.3 -> 2.12.13)
  • The bundled org.scala-stm.scala-stm-library dependency will need updating for Scala 2.13

twitter-finagle/* (Scala 2.12)

  • Updated Scala version (2.11.8 -> 2.12.13)
  • Removed direct com.twitter.twitter-server dependency (19.4.0)
  • Removed direct com.twitter.common.metrics dependency (0.0.39)
  • Removed direct com.twitter.common.io dependency (0.0.69)
  • Removed direct com.twitter.util-events dependency (7.0.0)
  • Added dependency com.google.guava.guava dependency (19.0)
  • Added dependency commons-io.commons-io dependency (2.4)

Externally visible core/harness changes

  • The BenchmarkContext has been changed to reduce clutter due to different methods for accessing benchmark parameters. The interface now provides only a single parameter() method returning an instance of BenchmarkParameter, which then provides the convenience methods parameter value conversion, allowing the two interfaces to evolve independently.
  • The handling of scratch directories has been cleaned up, triggering a number of changes:
    • The BenchmarkContext interface provides the scratchDirectory() method, which provides a benchmark-specific scratch directory, which is removed when JVM exits. The other methods related to creating temporary directories will be removed when all benchmarks are updated to use the new interface.
    • The DirUtils class provides methods to recursively clean or delete directories so that benchmarks don't need to depend on the Apache commons-io library just for that.
    • The suite accepts two new command-line options --scratch-base and --keep-scratch. The first allows setting a base directory for scratch directories created by the suite, and the second prevents removal of those directories on JVM exit (for testing/debugging purposes).
    • The JMH wrapper (JmhRenaissanceBenchmark) provides similar functionality through system settings org.renaissance.jmh.scratchBase (path) and org.renaissance.jmh.keepScratch (boolean).
  • The suite now checks the JVM version requirements selected for execution and excludes benchmarks that are incompatible with the current JVM from execution.
    • Consequently, the --raw-list option, which is intended for use in shell scripts, will only list benchmarks compatible with the JVM.
    • In both cases, the filtering can be disabled by using the --no-jvm-check option.
  • The JMH wrapper can substitute (and execute) the dummy-empty benchmark in place of an incompatible benchmark (after issuing a textual warning). This is only useful for CI automation, because JMH does not provide a way to report/skip an incompatible benchmark without failure, and (unlike the normal Renaissance bundle) the JMH-enabled bundle cannot provide a list of compatible benchmarks. This behavior must be explicitly enabled by setting the org.renaissance.jmh.fakeIncompatible system property to true.

Internal core/harness changes

  • The Launcher, RenaissanceSuite, and the JmhRenaissanceBenchmark classes now create independent scratch directories to reflect the different contexts in which they execute.
  • The ModuleLoader class is no longer a class-based (static) singleton, but is supposed to be instantiated and needs to be provided with a scratch directory into which it can write module libraries.
  • The BenchmarkInfo class now provides a method to load a particular benchmark using a given ModuleLoader instance. The method has been moved from the BenchmarkRegistry class.

Use default value for scalafmtConfig, the new version does not
like Some() as a value.
Also add local .scalafmt.conf to including configuration
from the root project to make the project work locally.
Includes minor changes to avoid using deprecated features.
Keeps the dependencies untouched (yet), but adds comments to
indicate potential upgrade paths. This may actually require
splitting the akka-uct and reactors benchmarks into two sub-projects.

AkkaUct depends on akka-actor (which is still being developed).
Versions 2.5.x and 2.6.x of akka-actor support Scala up to 2.13,
version 2.3.x supports Scala only up to 2.11.

Reactors depends on reactors-core, which (even in version 0.8) only
supports Scala up to 2.11. Version 0.9 is in development, but it is
not clear which version of Scala it supports.
Using Scala 2.13 will require using at least version 0.18.1 of the
dotty-compiler library, which will in turn require updating the input
sources, because since version 0.13.0, dotty starts to dislike them.
This time without breaking the loop by completely
removing the list of benchmarks.
Neo4j apparently needs a lot more work, because version 3.5.x do
not work with Scala 2.12.12 or 2.13.x. When compiled with Scala 2.12.12,
the benchmark crashes during initialization, failing to find class
definition for Scala.Product$class despite everything being on the
class path.
Finally a bit more civilized version to break down arguments
lbulej added 4 commits April 20, 2021 00:04
Avoids ugly dependency on Launcher scratch root directory field,
and allows fixing JMH wrappers.
This allows initializing the ModuleLoader using scratch directory
managed by the JMH wrapper, not by the Launcher class.
This is done by loading and executing the `dummy-empty` benchmark
instead of the incompatible benchmark. This must be enabled by
setting the `org.renaissance.jmh.fakeIncompatible` property to `true`.

This is now used in the Travis configuration.
Copy link
Collaborator

@farquet farquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incredible work on both the quality and quantity sides!

I have a few minor comments based on the code review. I will now test the branch (and its new options) locally before giving the approval.

lbulej added 12 commits April 21, 2021 20:49
This was triggered by the need to provide access to a boolean
parameter value. Instead of adding another method to BenchmarkContext,
I have modified BenchmarkContext to provide only a single method to
access benchmark parameters, which returns BenchmarkParameter instance.

The BenchmarkParameter class then provides access to the parameter
value (as string) along with a number of convenience methods that
convert the string to a typed value.

This will prevent changes to the BenchmarkContext interface whenever
we want to add a common parameter access method.

In addition, we can remove some of the (duplicated) code implementing
BenchmarkContext in the ExecutionDriver and JmhRenaissanceBenchmark.

Benchmark updates to the interface follow.
In most cases, we also use a toPositiveInteger() method to indicate
that the parameter value has to be positive (greater than 0), and
in some cases, we use a common toList() method to get a list of
typed values from a comma-separated list of elements.
lbulej added 5 commits April 22, 2021 12:57
One less option to worry about, based on the assumption that the
--raw-list option is meant for machines (scripts), which should
get a list of compatible benchmarks by default.

Using the --no-jvm-option together with --raw-list will produce
a list of all benchmarks, regardless of compatibility.

The output of the --list option, which is meant for humans, was
modified to indicate whether a benchmark is compatible with the
current JVM.
This is the last of 1.4.x series, 1.5.0 forces the use of the
slash notation when referring to settings (which we don't use
everywhere yet).
Copy link
Collaborator

@farquet farquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After careful local experimenting with all features from the branch, everything works like a charm as expected.

Feel free to merge it such that we can iterate on smaller PRs for upcoming changes :)

@lbulej
Copy link
Member Author

lbulej commented Apr 27, 2021

Thanks for taking the time to review and experiment with this pile of changes! I'll try to go back to much smaller PRs :-)

@lbulej lbulej merged commit 8e3f720 into master Apr 27, 2021
@lbulej lbulej deleted the topic/update-scala-sbt branch April 28, 2021 11:44
@lbulej lbulej mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants