3x slower IfVsCaseBenchmarks_ifBench6In #9165

JaroslavTulach · 2024-02-23T14:27:32Z

Since January 26, 2024 the IfVsCaseBenchmarks_ifBench6In slowed down three times:

First slowdown has happened during fourteen days when the measurements were off. Then there was another drop in performance on February 19, 2024.

The text was updated successfully, but these errors were encountered:

Akirathan · 2024-03-05T12:37:31Z

After bisecting, I discovered that the regression was introduced by #8779

JaroslavTulach · 2024-03-05T15:34:24Z

After bisecting, I discovered that the regression was introduced by #8779

Thank you for the investigation. I am eager to hear more details! Sorry for regressing (again).

Akirathan · 2024-03-05T17:58:37Z

Thank you for the investigation. I am eager to hear more details! Sorry for regressing (again).

@JaroslavTulach I believe the regression is caused by the newly introduced typechecks in Number binary operations, for example, in Number.<.

Let's have the following program:

from Standard.Base import all
my_bench x y = x < y
main = my_bench 1 2

And run it with:

env JAVA_OPTS='-Dpolyglot.engine.CompileImmediately=true -Dgraal.Dump=Truffle:1 -Dgraal.PrintGraph=File -Dpolyglot.engine.MultiTier=false' enso --log-level INFO --run ~/tmp/tmp.enso

And the inspect the generated Truffle ASTs in IGV. In the bad revision (the new one that introduced the regression), we can see a compilation for Integer.< method, its AST looks like this:

We can see there are nodes for type checks, and also, the AST is pretty large.
Looking at the After Truffle Tier graph:

We can see a lot of If nodes, Deopts, and other red nodes. All of these are somehow connected to the type checking code.

In the good revision (the old one without regression), there is no compilation for Integer.<, as it gets inlined into the tmp.my_bench function:

In the picture we can see that LessNodeGen.execute got inlined into a single BoxNode.

The reason why org.enso.interpreter.bench.benchmarks.semantic.IfVsCaseBenchmarks.ifBench6In benchmark is affected by the type checks on binary operations on numbers is that the benchmark uses Range.up_to and Vector.fold (and other methods on Vector and Range) which uses Integer.< to check for the end of iteration.

Akirathan · 2024-03-05T18:05:35Z

I believe the regression is simply caused by adding type checks (by introducing inline type ascriptions). Runtime type checking is not for free. The question remains, what kind of slowdown are we willing to tolerate? The next step that I will do will be to create a benchmark that will compare the performance of type checks, i.e., a method without type checks VS a method with type checks. I think we do not have any benchmark like that. Once I am done with that benchmark, I will report my further findings here.

enso-bot · 2024-03-05T18:08:06Z

Pavel Marek reports a new STANDUP for today (2024-03-05):

Progress: - Bisected and investigated the regression and figured out the root cause - #9165 (comment)

Next step is to create a benchmark for runtime type checking. It should be finished by 2024-03-08.

JaroslavTulach · 2024-03-06T06:01:31Z

benchmark that will compare the performance of type checks

There are various sieve benchmarks including one without type ascriptions and another with ascribed type checks and yet another one with ascribed return types. Yes, ascribed versions are slightly ~10% slower.

There is another set of benchmarks in ListBenchmarks that compares ascribed and non-ascribed behavior.

JaroslavTulach · 2024-03-06T10:01:12Z

env JAVA_OPTS='-Dpolyglot.engine.CompileImmediately=true -Dgraal.Dump=Truffle:1 -Dgraal.PrintGraph=File -Dpolyglot.engine.MultiTier=false' enso --log-level INFO --run ~/tmp/tmp.enso

Thanks for the instructions. I see that the Integer.< graph is huge and includes crap that shouldn't be there. This change makes it smaller:

diff --git engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
index f78a439b20..0d0d0c69a9 100644
--- engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
+++ engine/runtime/src/main/java/org/enso/interpreter/node/callable/argument/ReadArgumentCheckNode.java
@@ -15,6 +15,7 @@ import com.oracle.truffle.api.nodes.InvalidAssumptionException;
 import com.oracle.truffle.api.nodes.Node;
 import com.oracle.truffle.api.nodes.NodeUtil;
 import com.oracle.truffle.api.nodes.RootNode;
+import com.oracle.truffle.api.profiles.BranchProfile;
 import java.util.Arrays;
 import java.util.List;
 import java.util.stream.Collectors;
@@ -252,6 +253,7 @@ public abstract class ReadArgumentCheckNode extends Node {
     @Child IsValueOfTypeNode checkType;
     @CompilerDirectives.CompilationFinal private String expectedTypeMessage;
     @CompilerDirectives.CompilationFinal private LazyCheckRootNode lazyCheck;
+    private final BranchProfile multiValueProfile = BranchProfile.create();
 
     TypeCheckNode(String name, Type expectedType) {
       super(name);
@@ -319,6 +321,7 @@ public abstract class ReadArgumentCheckNode extends Node {
         return lazyCheckFn;
       }
       if (v instanceof EnsoMultiValue mv) {
+        multiValueProfile.enter();
         var result = mv.castTo(expectedType);
         if (result != null) {
           return result;

we can cut the multi value support off as EnsoMultiValue is extremely unlikely to appear where Number is expected. Not sure whether this change has any impact on the benchmark yet.

enso-bot · 2024-03-06T17:54:02Z

Pavel Marek reports a new STANDUP for today (2024-03-06):

Progress: - Looking into more recent performance regressions.

Playing with: Synchronous graal compilation, Tier2 only compilation, tracing compilation details, splitting, inlining, ...
Looking and comparing IGV graphs.
Meetings, discussions, and reviews. It should be finished by 2024-03-08.

Akirathan · 2024-03-07T11:56:56Z

After a few hours of more in-depth investigation, I am giving up for now with the conclusion that the regression is most-likely caused by just having more Truffle and Graal nodes in the compilation. The initial suspicion of @JaroslavTulach that the bad revision does not inline if_bench_6_in method did not prove to be correct, as everything is inlined into a single AST as expected.

I have mostly followed this guide and made some notes along the way (Generated outputs and snapshots are in
if_bench_6_in.zip)
):
My notes:

Bench name: if_bench_6_in.enso
Data saved in: ~/tmp/perf/if_bench_6_in
Bench params:
- INPUT_VEC_SIZE = 100 * 1000
- ITERS = 10 * 1000
General observations:
- JVM versions:
  - good: 21.0.1-graalce
  - bad: 21.0.2-graalce
- Wall time:
  - good: 12 secs
  - bad: 30 secs
1. CPUSampler
- env JAVA_OPTS='-Dpolyglot.cpusampler=true -Dpolyglot.cpusampler.Delay=4000' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso
  - Just says that most of the time is spent in Range.enso:241:go
1. VisualVM CPU sampler
- Bad:
  
  org.enso.EngineRunnerBootLoader (pid 28166)
  - 89% CPU time spent in org.enso.interpreter.runtime.control.TailCallException.
- Good:
  - 77% CPU time spent in org.enso.interpreter.runtime.control.TailCallException.
- Output in: snapshot-*.nps
- Inconclusive
1. --compiler.TracePerformanceWarnings=all does not reveal anything
1. Trace compilation
- env JAVA_OPTS='-Dpolyglot.engine.TraceCompilation=true -Dpolyglot.engine.CompilationStatisticDetails=true' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso > ~/tmp/perf/if_bench_6_in/bad/trace-compilation.txt 2>&1
- Output in: trace-compilation.txt
- Bad:
  - Target inlined into only caller: 16
  - Temporary bailouts: 6
    - CalcellationBailoutException: 4
      
      Expected, same as in good
    - jdk.vm.ci.code.BailoutException: Code installation failed: dependencies failed
      - Failed dependency of type abstract_with_unique_concrete_subtype
        
        context = *org.enso.interpreter.runtime.data.atom.Atom
        
        class = org.enso.interpreter.runtime.data.atom.BoxingAtom
        
        witness = org.enso.interpreter.runtime.data.atom.Layout_Atom_3_0
      - Cannot reproduce this BailoutException on the second run
  - Interrupted compilations: -3
    
    Not in good
  - Total Truffle node count:
    - Monomorphic: 13635
      
      More Truffle nodes than in good
- Good:
  - Target inlined into only caller: 26
  - Temporary bailouts: 4
    - CancellationBailoutException: 4
      
      This is expected, as the process had exited before the compilation was finished
  - Total Truffle node count:
    - Monomorphic: 11367
- Conclusion:
  - There are more Graal and Truffle nodes in the bad version and thus, there is more stuff to compile. Moreover, there is a suspicious BailoutException in bad revision: "Code installation failed: dependencies failed".
    - oracle/graal@45d2ea1
    - Can't find any other code related to this BailoutException neither in Truffler neither in Compiler
    - This error is provided from HotSpotCodeCacheProvider
      - https://github.com/search?q=repo%3Aopenjdk%2Fjdk+dependencies_failed&type=code
1. Instrument boundary calls
- env JAVA_OPTS='-Dpolyglot.compiler.InstrumentBoundaries=true' enso --no-ir-caches --log-level INFO --run ~/tmp/perf/if_bench_6_in/if_bench_6_in.enso
  
  Did not finish
1. Inspect graphs in IGV
- Nothing suspicious found

JaroslavTulach · 2024-03-07T16:23:16Z

Thank you for your investigation and different point of view. I'll take the issue back - at the end it is clearly my regression, so I should be responsible for fixing it.

enso-bot · 2024-03-07T17:55:43Z

Pavel Marek reports a new STANDUP for today (2024-03-07):

Progress: - Giving up further investigation for now, my conclusion is that there are just too many Graal and Truffle nodes in the compilation. It should be finished by 2024-03-08.

JaroslavTulach added -compiler --low-performance triage labels Feb 23, 2024

JaroslavTulach mentioned this issue Feb 23, 2024

10x slowdown in Collections_list_meta_fold #9166

Closed

enso-bot bot mentioned this issue Feb 24, 2024

if then else and chained block syntax #8489

Closed

JaroslavTulach assigned Akirathan Feb 27, 2024

JaroslavTulach removed the triage label Feb 27, 2024

enso-bot bot mentioned this issue Mar 7, 2024

Prefix autoscoped constructors with .. #9275

Closed

JaroslavTulach assigned JaroslavTulach and unassigned Akirathan Mar 7, 2024

JaroslavTulach linked a pull request Oct 19, 2024 that will close this issue

Recognize if-like Tree.MultiSegmentApp as IfThenElse IR #11365

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3x slower IfVsCaseBenchmarks_ifBench6In #9165

3x slower IfVsCaseBenchmarks_ifBench6In #9165

JaroslavTulach commented Feb 23, 2024

Akirathan commented Mar 5, 2024 •

edited

Loading

JaroslavTulach commented Mar 5, 2024 •

edited

Loading

Akirathan commented Mar 5, 2024

Akirathan commented Mar 5, 2024

enso-bot bot commented Mar 5, 2024

JaroslavTulach commented Mar 6, 2024

JaroslavTulach commented Mar 6, 2024

enso-bot bot commented Mar 6, 2024

Akirathan commented Mar 7, 2024

JaroslavTulach commented Mar 7, 2024

enso-bot bot commented Mar 7, 2024

3x slower IfVsCaseBenchmarks_ifBench6In #9165

3x slower IfVsCaseBenchmarks_ifBench6In #9165

Comments

JaroslavTulach commented Feb 23, 2024

Akirathan commented Mar 5, 2024 • edited Loading

JaroslavTulach commented Mar 5, 2024 • edited Loading

Akirathan commented Mar 5, 2024

Akirathan commented Mar 5, 2024

enso-bot bot commented Mar 5, 2024

JaroslavTulach commented Mar 6, 2024

JaroslavTulach commented Mar 6, 2024

enso-bot bot commented Mar 6, 2024

Akirathan commented Mar 7, 2024

JaroslavTulach commented Mar 7, 2024

enso-bot bot commented Mar 7, 2024

Akirathan commented Mar 5, 2024 •

edited

Loading

JaroslavTulach commented Mar 5, 2024 •

edited

Loading