Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Array_Like_Helpers.map to a builtin to reduce stack size #11363

Merged
merged 39 commits into from
Nov 6, 2024

Conversation

Akirathan
Copy link
Member

@Akirathan Akirathan commented Oct 18, 2024

Fixes #11329

Pull Request Description

The ultimate goal is to reduce the method calls necessary for Vector.map.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • Run (at least engine) benchmarks to ensure no regressions.
  • The documentation has been updated, if necessary.
  • All code follows the
    Scala,
    Java,
  • Unit tests have been written where possible.

@Akirathan Akirathan added the CI: No changelog needed Do not require a changelog entry for this PR. label Oct 18, 2024
@Akirathan Akirathan self-assigned this Oct 18, 2024
@Akirathan
Copy link
Member Author

The experiment in Stack_Size_Spec.enso generates code for nested Vector.map calls and invokes enso subprocess with custom -Xss option (and with disabled Truffle compiler). The current output is:

{nesting: 3, stack_size: 256k} SUCCEEDED
{nesting: 6, stack_size: 256k} FAILED with StackOverflow
{nesting: 9, stack_size: 256k} SKIPPED (SO already encountered with the same stack size in lower nesting levels)
{nesting: 3, stack_size: 300k} SUCCEEDED
{nesting: 6, stack_size: 300k} SUCCEEDED
{nesting: 9, stack_size: 300k} FAILED with StackOverflow

Which means that, e.g., a Vector.map method called on nested [[[[[[42]]]]]] with -Xss256k failed with SO.
The code for "nesting: 3" looks for example like this:

from Standard.Base import all
main =
    vec = [[[42]]]
    vec.map e0->
        e0.map e1->
            e1.map e2->
                e2 + 1

I am not sure yet what to do with that experiment. We should probably integrate it somehow later as a regression test.

@Akirathan
Copy link
Member Author

990ee6a introduces a regression test Stack_Size_Spec that invokes enso as subprocess with some specific JVM options and counts the Java stack frames between invocations of Vector.map, and assumes that the count is less than 115. On develop, this test fails - there are 150 stack frames between calls. In this PR, it succeeds - there are 103 stack frames. With added @Tail_Call annotations to Vector.map and Array_Like_Helpers.map:

diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso
index 4ce7c5bdb4..27c9bed438 100644
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso
@@ -700,7 +700,7 @@ type Vector a
              [1, 2, 3] . map +1
     map : (Any -> Any) -> Problem_Behavior | No_Wrap -> Vector Any
     map self function on_problems:(Problem_Behavior | No_Wrap)=..Report_Error =
-        Array_Like_Helpers.map self function on_problems
+        @Tail_Call Array_Like_Helpers.map self function on_problems
 
     ## ICON union
        Applies a function to each element of the vector, returning the `Vector`
diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso
index aaf2442940..416be3d4a5 100644
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso
@@ -229,7 +229,7 @@ transpose vec_of_vecs =
             Vector.from_polyglot_array proxy
 
 map vector function on_problems =
-    vector_from_function vector.length (i-> function (vector.at i)) on_problems
+    @Tail_Call vector_from_function vector.length (i-> function (vector.at i)) on_problems
 
 map_with_index vector function on_problems =
     vector_from_function vector.length (i-> function i (vector.at i)) on_problems

the number of stack frames is 22, with the disadvantage that the Vector.map call is not visible in Runtime.get_stack_trace. At the end, I have decided not to add the @Tail_Call annotations there for now. I think that the reduction of 35 Java stack frames per Vector.map method call is sufficient improvement.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few code comments inlined.

At the end, I have decided not to add the @Tail_Call annotations there for now. I think that the reduction of 35 Java stack frames per Vector.map method call is sufficient improvement.

I don't think I agree with such decision. I am more concerned about performance and avoiding StackoverflowError than the shape of the stack. Moreover ...

the number of stack frames is 22, with the disadvantage that the Vector.map call is not visible in Runtime.get_stack_trace.

...if we want to see something that resembles Vector.map in the stack, then rename Array_Like_Helpers to Vector_Impl and vector_from_function to map and we'll get a reasonable stack names while keeping the stack size down to 22.

@Cached BranchProfile errorEncounteredProfile) {
var ctx = EnsoContext.get(this);
var onProblems = processOnProblemsArg(onProblemsAtom, typesLib);
var len = Math.toIntExact(length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen on ArithmeticException? It shouldn't crash the interpreter.

import org.enso.interpreter.dsl.BuiltinType;
import org.enso.interpreter.node.expression.builtin.Builtin;

@BuiltinType(name = "Standard.Base.Data.Vector.No_Wrap")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic, but I'd like to know the answer: What is name attribute optional? It is specified on some of the builtins, but not on all. Shouldn't it be specified all the time? I am asking because of

the proposed Standard.Prelude should include all the builtins and their methods, but we want to have their names exactly the same as in Standard.Base - can we generate such a Prelude.enso from these annotations?

test/Base_Tests/src/Runtime/Stack_Size_Spec.enso Outdated Show resolved Hide resolved
@Akirathan
Copy link
Member Author

Akirathan commented Oct 28, 2024

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.
GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@Akirathan
Copy link
Member Author

Akirathan commented Oct 29, 2024

Another benchmarks scheduled at:

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.
GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@Akirathan
Copy link
Member Author

Caching onProblems argument apparently helped:
image

@Akirathan
Copy link
Member Author

Locally verified regression of IfVSCaseBenchmarks.ifBench6In:
image

local develop results: 1.685 ± 0.534
local results on this PR: 2.259 ± 0.320

@JaroslavTulach
Copy link
Member

Locally verified regression of IfVSCaseBenchmarks.ifBench6In:...
local develop results: 1.685 ± 0.534 local results on this PR: 2.259 ± 0.320

I'd like to point out that #11365 is going to change the way we do if/then/else. Once #11365 is in, we will no longer try to invoke if_then_else method on the condition. However take a look at the ifBench6In benchmark:

        if_bench_6_in vec =
            vec.fold 0 acc-> curr->
                curr.f1.not.if_then_else acc <|
                    curr.f2.not.if_then_else acc <|
                        curr.f3.not.if_then_else acc <|
                            curr.f4.not.if_then_else acc <|
                                curr.f5.not.if_then_else acc <|
                                    curr.f6.not.if_then_else acc <|
                                        acc + 1

The ifBench6In benchmark is doing exactly that! It invokes the if_then_else method.

If slowdown in ifBench6In benchmark is the only blocker, then I suggest to ignore it.

@Akirathan
Copy link
Member Author

Another locally verified regression of org.enso.interpreter.bench.benchmarks.semantic.TypePatternBenchmarks.matchOverAny benchmark, after increasing the dataset size (so that one iteration takes more than 10E-6 ms):

diff --git a/engine/runtime-benchmarks/src/main/java/org/enso/interpreter/bench/benchmarks/semantic/TypePatternBenchmarks.java b/engine/runtime-benchmarks/src/main/java/org/enso/interpreter/bench/benchmarks/semantic/TypePatternBenchmarks.java
index 9343bd153..97a742657 100644
--- a/engine/runtime-benchmarks/src/main/java/org/enso/interpreter/bench/benchmarks/semantic/TypePatternBenchmarks.java
+++ b/engine/runtime-benchmarks/src/main/java/org/enso/interpreter/bench/benchmarks/semantic/TypePatternBenchmarks.java
@@ -66,7 +66,7 @@ public class TypePatternBenchmarks {
 
     Function<String, Value> getMethod = (name) -> module.invokeMember(Module.EVAL_EXPRESSION, name);
 
-    var length = 100;
+    var length = 100_000;
     this.vec = getMethod.apply("gen_vec").execute(length, 1.1);
     switch (SrcUtil.findName(params)) {
       case "matchOverAny" -> this.patternMatch = getMethod.apply("match_any");

On develop: 0.213 ± 0.020
On this PR: 0.411 ± 0.004

@Akirathan
Copy link
Member Author

Regressions resolved. Turns out I was caching the Atom onProblems argument incorrectly. Fixed by converting No_Wrap builtin type to uniquely constructible builtin and caching AtomConstructor directly in d2dcc04. That commit also introduces LoopConditionProfile which speeds up the benchmark tremendously. Local results of org.enso.interpreter.bench.benchmarks.semantic.TypePatternBenchmarks.matchOverAny is now 2x faster than on develop.

# Conflicts:
#	distribution/lib/Standard/Base/0.0.0-dev/src/Network/HTTP.enso
@Akirathan
Copy link
Member Author

Wait for the tests to be green and then, let's schedule another round of benchmarks

@JaroslavTulach
Copy link
Member

Wait for the tests to be green and then, let's schedule another round of benchmarks

Benchmark may run even without tests to be fully green. Trying to run the benchmarks right now might speed things up.

@Akirathan
Copy link
Member Author

Akirathan commented Nov 5, 2024

Benchmarks scheduled:

GitHub
Enso Analytics is a self-service data prep and analysis platform designed for data teams. - Benchmark Engine · 2cbc1e9
GitHub
Enso Analytics is a self-service data prep and analysis platform designed for data teams. - Benchmark Standard Libraries · 2cbc1e9

@Akirathan
Copy link
Member Author

All (stable) benchmarks are either the same, or better, let's integrate.

@Akirathan Akirathan added the CI: Ready to merge This PR is eligible for automatic merge label Nov 6, 2024
@mergify mergify bot merged commit 701bba6 into develop Nov 6, 2024
39 of 42 checks passed
@mergify mergify bot deleted the wip/akirathan/11329-map-reduce-stack branch November 6, 2024 11:14
GregoryTravis pushed a commit that referenced this pull request Nov 6, 2024
…11363)

The ultimate goal is to reduce the method calls necessary for `Vector.map`.

# Important Notes
- I managed to reduce the number of Java stack frames needed for each `Vector.map` call from **150** to **22** (See #11363 (comment))
- Introduced `Stack_Size_Spec` regression test that will ensure that Java stack frames needed for `Vector.map` method call does not exceed **40**.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Array_Like_Helpers.map to a builtin to reduce stack size
4 participants