Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3x slower IfVsCaseBenchmarks_ifBench6In #9165

Open
JaroslavTulach opened this issue Feb 23, 2024 · 11 comments · May be fixed by #11365
Open

3x slower IfVsCaseBenchmarks_ifBench6In #9165

JaroslavTulach opened this issue Feb 23, 2024 · 11 comments · May be fixed by #11365

Comments

@JaroslavTulach
Copy link
Member

Since January 26, 2024 the IfVsCaseBenchmarks_ifBench6In slowed down three times:

obrazek

First slowdown has happened during fourteen days when the measurements were off. Then there was another drop in performance on February 19, 2024.

@Akirathan
Copy link
Member

Akirathan commented Mar 5, 2024

After bisecting, I discovered that the regression was introduced by #8779

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Mar 5, 2024

After bisecting, I discovered that the regression was introduced by #8779

Thank you for the investigation. I am eager to hear more details! Sorry for regressing (again).

@Akirathan
Copy link
Member

Thank you for the investigation. I am eager to hear more details! Sorry for regressing (again).

@JaroslavTulach I believe the regression is caused by the newly introduced typechecks in Number binary operations, for example, in Number.<.

Let's have the following program:

from Standard.Base import all
my_bench x y = x < y
main = my_bench 1 2

And run it with:

env JAVA_OPTS='-Dpolyglot.engine.CompileImmediately=true -Dgraal.Dump=Truffle:1 -Dgraal.PrintGraph=File -Dpolyglot.engine.MultiTier=false' enso --log-level INFO --run ~/tmp/tmp.enso

And the inspect the generated Truffle ASTs in IGV. In the bad revision (the new one that introduced the regression), we can see a compilation for Integer.< method, its AST looks like this:
image
We can see there are nodes for type checks, and also, the AST is pretty large.
Looking at the After Truffle Tier graph:
image
We can see a lot of If nodes, Deopts, and other red nodes. All of these are somehow connected to the type checking code.

In the good revision (the old one without regression), there is no compilation for Integer.<, as it gets inlined into the tmp.my_bench function:
image
In the picture we can see that LessNodeGen.execute got inlined into a single BoxNode.


The reason why org.enso.interpreter.bench.benchmarks.semantic.IfVsCaseBenchmarks.ifBench6In benchmark is affected by the type checks on binary operations on numbers is that the benchmark uses Range.up_to and Vector.fold (and other methods on Vector and Range) which uses Integer.< to check for the end of iteration.

@Akirathan
Copy link
Member

I believe the regression is simply caused by adding type checks (by introducing inline type ascriptions). Runtime type checking is not for free. The question remains, what kind of slowdown are we willing to tolerate? The next step that I will do will be to create a benchmark that will compare the performance of type checks, i.e., a method without type checks VS a method with type checks. I think we do not have any benchmark like that. Once I am done with that benchmark, I will report my further findings here.

@enso-bot
Copy link

enso-bot bot commented Mar 5, 2024

Pavel Marek reports a new STANDUP for today (2024-03-05):

Progress: - Bisected and investigated the regression and figured out the root cause - #9165 (comment)

  • Next step is to create a benchmark for runtime type checking. It should be finished by 2024-03-08.

@JaroslavTulach
Copy link
Member Author

benchmark that will compare the performance of type checks

There are various sieve benchmarks including one without type ascriptions and another with ascribed type checks and yet another one with ascribed return types. Yes, ascribed versions are slightly ~10% slower.

There is another set of benchmarks in ListBenchmarks that compares ascribed and non-ascribed behavior.

@JaroslavTulach
Copy link
Member Author

env JAVA_OPTS='-Dpolyglot.engine.CompileImmediately=true -Dgraal.Dump=Truffle:1 -Dgraal.PrintGraph=File -Dpolyglot.engine.MultiTier=false' enso --log-level INFO --run ~/tmp/tmp.enso

Thanks for the instructions. I see that the Integer.< graph is huge and includes crap that shouldn't be there. This change makes it smaller:

diff --git engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
index f78a439b20..0d0d0c69a9 100644
--- engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
+++ engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
@@ -15,6 +15,7 @@ import com.oracle.truffle.api.nodes.InvalidAssumptionException;
 import com.oracle.truffle.api.nodes.Node;
 import com.oracle.truffle.api.nodes.NodeUtil;
 import com.oracle.truffle.api.nodes.RootNode;
+import com.oracle.truffle.api.profiles.BranchProfile;
 import java.util.Arrays;
 import java.util.List;
 import java.util.stream.Collectors;
@@ -252,6 +253,7 @@ public abstract class ReadArgumentCheckNode extends Node {
     @Child IsValueOfTypeNode checkType;
     @CompilerDirectives.CompilationFinal private String expectedTypeMessage;
     @CompilerDirectives.CompilationFinal private LazyCheckRootNode lazyCheck;
+    private final BranchProfile multiValueProfile = BranchProfile.create();
 
     TypeCheckNode(String name, Type expectedType) {
       super(name);
@@ -319,6 +321,7 @@ public abstract class ReadArgumentCheckNode extends Node {
         return lazyCheckFn;
       }
       if (v instanceof EnsoMultiValue mv) {
+        multiValueProfile.enter();
         var result = mv.castTo(expectedType);
         if (result != null) {
           return result;

we can cut the multi value support off as EnsoMultiValue is extremely unlikely to appear where Number is expected. Not sure whether this change has any impact on the benchmark yet.

@enso-bot
Copy link

enso-bot bot commented Mar 6, 2024

Pavel Marek reports a new STANDUP for today (2024-03-06):

Progress: - Looking into more recent performance regressions.

  • Playing with: Synchronous graal compilation, Tier2 only compilation, tracing compilation details, splitting, inlining, ...
  • Looking and comparing IGV graphs.
  • Meetings, discussions, and reviews. It should be finished by 2024-03-08.

@Akirathan
Copy link
Member

After a few hours of more in-depth investigation, I am giving up for now with the conclusion that the regression is most-likely caused by just having more Truffle and Graal nodes in the compilation. The initial suspicion of @JaroslavTulach that the bad revision does not inline if_bench_6_in method did not prove to be correct, as everything is inlined into a single AST as expected.

I have mostly followed this guide and made some notes along the way (Generated outputs and snapshots are in
if_bench_6_in.zip)
):
My notes:

  • Bench name: if_bench_6_in.enso
  • Data saved in: ~/tmp/perf/if_bench_6_in
  • Bench params:
    • INPUT_VEC_SIZE = 100 * 1000
    • ITERS = 10 * 1000
  • General observations:
    • JVM versions:
      • good: 21.0.1-graalce
      • bad: 21.0.2-graalce
    • Wall time:
      • good: 12 secs
      • bad: 30 secs
    1. CPUSampler
    • env JAVA_OPTS='-Dpolyglot.cpusampler=true -Dpolyglot.cpusampler.Delay=4000' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso
      • Just says that most of the time is spent in Range.enso:241:go
    1. VisualVM CPU sampler
    • Bad:

      org.enso.EngineRunnerBootLoader (pid 28166)

      • 89% CPU time spent in org.enso.interpreter.runtime.control.TailCallException.
    • Good:
      • 77% CPU time spent in org.enso.interpreter.runtime.control.TailCallException.
    • Output in: snapshot-*.nps
    • Inconclusive
    1. --compiler.TracePerformanceWarnings=all does not reveal anything
    1. Trace compilation
    • env JAVA_OPTS='-Dpolyglot.engine.TraceCompilation=true -Dpolyglot.engine.CompilationStatisticDetails=true' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso > ~/tmp/perf/if_bench_6_in/bad/trace-compilation.txt 2>&1
    • Output in: trace-compilation.txt
    • Bad:
      • Target inlined into only caller: 16
      • Temporary bailouts: 6
        • CalcellationBailoutException: 4

          Expected, same as in good

        • jdk.vm.ci.code.BailoutException: Code installation failed: dependencies failed
          • Failed dependency of type abstract_with_unique_concrete_subtype
            • context = *org.enso.interpreter.runtime.data.atom.Atom
            • class = org.enso.interpreter.runtime.data.atom.BoxingAtom
            • witness = org.enso.interpreter.runtime.data.atom.Layout_Atom_3_0
          • Cannot reproduce this BailoutException on the second run
      • Interrupted compilations: -3

        Not in good

      • Total Truffle node count:
        • Monomorphic: 13635

          More Truffle nodes than in good

    • Good:
      • Target inlined into only caller: 26
      • Temporary bailouts: 4
        • CancellationBailoutException: 4

          This is expected, as the process had exited before the compilation was finished

      • Total Truffle node count:
        • Monomorphic: 11367
    • Conclusion:
    1. Instrument boundary calls
    • env JAVA_OPTS='-Dpolyglot.compiler.InstrumentBoundaries=true' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso

      Did not finish

    1. Inspect graphs in IGV
    • Nothing suspicious found

@JaroslavTulach
Copy link
Member Author

Thank you for your investigation and different point of view. I'll take the issue back - at the end it is clearly my regression, so I should be responsible for fixing it.

@enso-bot
Copy link

enso-bot bot commented Mar 7, 2024

Pavel Marek reports a new STANDUP for today (2024-03-07):

Progress: - Giving up further investigation for now, my conclusion is that there are just too many Graal and Truffle nodes in the compilation. It should be finished by 2024-03-08.

@JaroslavTulach JaroslavTulach linked a pull request Oct 19, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ⚙️ Design
Development

Successfully merging a pull request may close this issue.

2 participants