Investigate changes to the trait encoding that could improve (cold) performance #5928

smarter · 2019-02-15T15:03:27Z

There's a lot of moving parts involved here so let me try to recap, note that this is all based on my own understanding which might be incomplete, feedback welcome (/cc @retronym @lrytz @adriaanm). I'm thinking that maybe we should turn this into a talk at the JVM Language Summit, maybe the JDK folks will take pity on us and fix some of this stuff.

1. Dotty implementation of super[MyTrait].foo for the JVM

A trait on the JVM is compiled down to an interface with default methods, in Scala we can write super[A].foo to call the default method foo implemented in A even if it's been overridden, in Dotty this compiles down to the following bytecode:

invokespecial A.foo

but this will only have the correct semantics if A is explicitly listed in the bytecode of the current class as a a super-interface, otherwise the wrong method might be called, here's an example:

trait A { def foo = "A" }
trait B extends A { override def foo = "B" }
class C extends A with B {
  def bar = super[A].foo
}

if we run javap on C.class, we'll see:

public class C implements A, B {

Because B implements A, the A part may seem redundant but in fact if we omit it then invokespecial A.foo will call the foo override in B !

2. Cold performance issue when doing default method resolution and listing all transitive parent interfaces

It turns out that listing all interfaces all can affect performance of default method resolution as investigated by @retronym in scala/scala-dev#98 (comment), therefore it pays to avoid listing interfaces which aren't necessary to get the correct semantics in the generated bytecode, and in fact Dotty already does better than what @retronym reported in his commen, I have no clue what the code involved is and when it changed but now if I compile:

trait A
trait B extends A
class C extends B

I get:

public class C implements B {

A does get listed in C if we add a super[A].something call in it, as in the example in section 1.

So far so good right ? Well, things are about to get even more complicated.

3. Cold performance issue when doing default method resolution (even when parent interfaces are pruned)

As painstakingly investigated by the Scala 2 team, too much default method resolution leads to bad cold performance. This was worked around by emitting mixin forwarders unconditionally.

4. Mixin forwarders

Given:

trait A {
  def foo = 1
}
trait B { self: A =>
  override def foo = 2
}
class C extends A with B
object Test {
  (new C).foo
}

The semantics of Scala requires that the call to foo on C ends up calling foo in B, but since B and A are unrelated from the point of view of the JVM and both have a foo, you'll get an IncompatibleClassChangeError unless an explicit forwarder is added to C:

class C extends A with B {
  override def foo = super[B].foo
}

Both Scala 2 and Dotty add these forwarders automatically for correctness. But Scala 2 adds them even when they're not strictly needed to avoid doing default method resolution when possible for (cold, maybe hot too ?) performance, eventually Dotty should do the same: #2563.

5. Scala 2 implementation of super[MyTrait].foo for the JVM

But wait, we need mixin forwarders to improve cold performance, but mixin forwarders are implemented using super-calls, and we just saw that super-calls require listing more interfaces, and listing more interfaces worsens cold performance 🤯. Can we do anything about this ? Scala 2 did by going back to the drawing board and changing the scheme for doing super-calls, let's look at the example from section 1 again:

trait A { def foo = "A" }
trait B extends A { override def foo = "B" }
class C extends A with B {
  def bar = super[A].foo
}

In Scala 2 this becomes when decompiled with cfr and slightly simplified:

public interface A {
  public static /* synthetic */ String foo$(A $this) {
    return $this.foo();
  }
  default public String foo() {
    return "A";
  }
}
public interface B extends A {
   public static /* synthetic */ String foo$(B $this) {
    return $this.foo();
  }
  @Override default public String foo() {
    return "B";
  }
}
public class C implements B {
  @Override public String foo() {
    return B.foo$(this);
  }
   public String bar() {
    return A.foo$(this)
  }
}

There's a few important things to notice here:

For every trait method implemented as a default method, a corresponding static method that just forwards to the default method is emitted, the static method name is always the default method name with a $ appended.
C lists B as a parent interface but not A, even though it contains a super[A].foo call
This works fine because the super-call is not implemented using invokespecial A.foo, instead invokestatic A.foo$ is used.

So Scala 2 trades off listing some interface parents against adding static methods everywhere, see scala/scala#5251 (comment) for more details.

6. OK but what do we do now ?

The good news is that as far as I can tell, the differences between Scala 2 and Dotty in trait encoding don't affect semantics, this is purely about performance so no urgent action is needed.

6.1. Benchmarking

The first thing we should do to investigate this is setup some benchmarking infrastructure to reason about hot and cold performance of the code generated by the compiler. In Scala 2 the benchmarking involved in making the decisions that lead to their particular encoding was (I think, correct me if I'm wrong) mostly based on https://github.com/scala/compiler-benchmark which benchmarks the performance of the compiler, this doesn't really work for Dotty currently because we don't compile scala-library and instead use the own compiled by Scala 2, and most of the impact of default method resolution on performance comes from the deep collection hierarchy, so we need to:

Be able to compile and run scala-library reliably (compiling it is part of our community build but we need to check that it actually works)
Benchmark the compiler with a Dotty-compiled scala-library
Benchmark cold and hot performance of the compiler (https://github.com/scala/compiler-benchmark has infrastructure for this)
Benchmark everything in this quadrant (~~A is what Dotty does currently~~ EDIT: Dotty has now switched to B, D is what Scala 2 does):

emit mixin forwarders when needed always emit mixin forwarders

super-call to trait with invokespecial A B

super-call to trait with invokestatic C D
Benchmark with a recent JDK, it looks like things might be improving finally

6.2. Possible implementation improvements

I think the most promising corner of the quadrant is B: we need logic to generate mixin forwarders anyway for correctness, so generating them all the time is not a big deal, by contrast adding a bunch o f static methods is extra complexity I'd rather avoid.

6.2.1. Avoid listing unnecessary extra interfaces

Mixin forwarders require extra super-calls on traits, but as @lrytz observed in scala/scala-dev#228, many super-calls don't actually require listing extra interfaces to work correctly and that reasoning seems to apply to mixin forwarders, from the issue:

trait T { def f = 1 }
trait U extends T
class C extends U { def t = super.f }

Here, emitting invokespecial U.f is correct, we don't need to add T as a direct parent of C.

And yet, if we look at the code generated by Dotty, it's not smart enough to avoid listing T:

public class C implements T, U {
   public int t() {
    return T.super.f();
  }
}

6.2.2. Emit trait constructors statically

In both Scala 2 and Dotty, the constructor of a trait is emitted as a $init$ method, but in Scala 2 it's a static method. Aligning ourselves with Scala 2 here would be an easy win because all transitive children of a trait with a constructor are required to call it, so in Dotty we end up with mandatory super-calls that force us to list transitive interfaces just to call $init$. Since $init$ is special anyway, making it static doesn't seem like a big deal.

7. See also

Here are various issues and PRs more or less related, this might not be helpful but at least they'll all have a backlink to this issue now:

The text was updated successfully, but these errors were encountered:

lrytz · 2019-03-04T19:14:11Z

Your understanding of the situation seems right to me, as far as I remember :-) I think we kept the static methods in interfaces because there's still a risk for performance degradation due to adding redundant direct interfaces in complicated hierarchies like collections. I think there was also an argument about binary compatibility for keeping the static accessors, but I don't remember what situation was being considered.

Drop vestigial code related to Scala 2.11 support. In particular, we created a fake impl class in AugmentScala2Trait only to add its members to the trait in LinkScala2Impls. We now directly add the members to the trait in LinkScala2Impls. We could potentially simplify things even further by getting rid of LinkScala2Impls since the static `foo$` methods in Scala 2 traits always forward to instance methods `foo`, but that could have performance implication as detailed in scala#5928, so we keep things as-is for now, but eventually we should either switch Dotty trait encoding to also use static forwarders, or not use them at all.

This is what Scala 2.12+ does for cold performance reasons (see scala#5928 for more details) and we should align ourselves with them when possible. About two years ago in scala#2563, Dmitry objected that we may not need to do that if we found another way to get performance back, or if newer JDKs improved the performance of default method resolution. It doesn't look like these things have happened so far (but there's some recent glimmer of hope here: https://bugs.openjdk.java.net/browse/JDK-8036580). Dmitry also said "I don't recall triggering bugs by emitting more forwarders rather then less.", but in fact since then I've found one case where the standard library failed to compile with extra forwarders causing name clashes, requiring a non-binary-compatible change: scala/scala@e3ef657. As of scala#6079 this is no longer a problem since we now emit mixin forwarders after erasure like scalac, but it still seems prudent to emit as many forwarders as scalac to catch potential name clash issues.

This is the best we can do with the current encoding of traits. The problem can be seen from the following example: trait A { def foo() = ??? } trait B { def foo() = ??? } object C extends A with B { super[A].foo() super[B].foo() } In the code above, we cannot translate the following calls from <init> to <clinit>: super[A].foo() super[B].foo() super[A].$iinit$() super[B].$init$() More details can be found here: scala#5928

The problem can be seen from the following example: trait A { def foo() = ??? } trait B { def foo() = ??? } object C extends A with B { super[A].foo() super[B].foo() } In the code above, we cannot translate the following calls from <init> to <clinit>: super[A].foo() super[B].foo() super[A].$iinit$() super[B].$init$() More details can be found here: scala#5928 A principled way would be to generage super accessors as it is done in posttyper. However, the backend has a magic to support prefix to super trees in [1], which is exploited in the Scala 2 fix [2]. [1] scala/scala#5944 [2] scala/scala#7270

smarter added prio:low exp:expert itype:performance labels Feb 15, 2019

smarter mentioned this issue Feb 15, 2019

Emit trait method bodies in statics [rebase of #5177] scala/scala#5251

Merged

smarter added area:benchmarks area:transform labels Feb 15, 2019

smarter mentioned this issue Mar 7, 2019

Simplify Scala 2 trait support #6040

Merged

smarter mentioned this issue May 21, 2019

Add infrastructure to run the JUnit tests of upstream Scala.js. #6365

Merged

smarter mentioned this issue Aug 27, 2019

Enable more Scala.js tests, fix two issues #7112

Merged

smarter mentioned this issue Sep 27, 2019

Traits do not have static forwarders in bytecode #7328

Closed

liufengyun mentioned this issue Jun 26, 2020

Fix #1441: init MODULE$ in <clinit> #9181

Merged

smarter mentioned this issue Sep 18, 2020

foldRight override on indexedSeqOps is forward incompatible scala/bug#12137

Closed

joroKr21 mentioned this issue Apr 27, 2021

Optimise boilerplate generators, use instance constructors typelevel/cats#3871

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate changes to the trait encoding that could improve (cold) performance #5928

Investigate changes to the trait encoding that could improve (cold) performance #5928

smarter commented Feb 15, 2019 •

edited

Loading

lrytz commented Mar 4, 2019

Investigate changes to the trait encoding that could improve (cold) performance #5928

Investigate changes to the trait encoding that could improve (cold) performance #5928

Comments

smarter commented Feb 15, 2019 • edited Loading

1. Dotty implementation of super[MyTrait].foo for the JVM

2. Cold performance issue when doing default method resolution and listing all transitive parent interfaces

3. Cold performance issue when doing default method resolution (even when parent interfaces are pruned)

4. Mixin forwarders

5. Scala 2 implementation of super[MyTrait].foo for the JVM

6. OK but what do we do now ?

6.1. Benchmarking

6.2. Possible implementation improvements

6.2.1. Avoid listing unnecessary extra interfaces

6.2.2. Emit trait constructors statically

7. See also

lrytz commented Mar 4, 2019

smarter commented Feb 15, 2019 •

edited

Loading