Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve inlining of <| on (GraalVM EE) #6384

Merged
merged 6 commits into from
Apr 25, 2023

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Apr 24, 2023

Pull Request Description

Benchmark that demonstrates deep hierarchy of nested <| doesn't inline. Moving most of the "function" operators (except |>) to Function as they can only work on functions. Cloning each BuiltinRootNode before use - builtins are lightweight and cloning helps to overcome Truffle recursive inlining limit.

Important Notes

The inlining didn't work well because:

  • Truffle sees recursive call to <| target and hits its default (2) inlining limit
  • Function.<| is currently handled as builtin, but uses VirtualFrame - e.g. wasn't subject to Speed cascade of if statements up #6255

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • All code follows the
    Scala,
    Java,
  • All code has been tested:
    • Unit tests continue to work
    • Benchmarks are fast on GraalVM EE

@JaroslavTulach JaroslavTulach self-assigned this Apr 24, 2023
@JaroslavTulach JaroslavTulach linked an issue Apr 24, 2023 that may be closed by this pull request
4 tasks
@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Apr 24, 2023
@JaroslavTulach JaroslavTulach requested a review from kustosz April 24, 2023 10:56
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Apr 24, 2023

The PR is ready for your review. Benchmarks are being executed, but running them locally indicates a cascade of <| is now fast enough. The generic speed up of every function is going to be tackled by

While the benchmarks seem to be fine on my local computer (0.5ms for ifBench6 and 0.8ms for ifBench6In) the server benchmark run wasn't good. Started a new benchmark run. Results are however still the same:

        <case>
            <label>org.enso.interpreter.bench.benchmarks.semantic.IfVsCaseBenchmarks.ifBench6</label>
            <scores>
                <score>1.1615075573402267</score>
            </scores>
        </case>
        <case>
            <label>org.enso.interpreter.bench.benchmarks.semantic.IfVsCaseBenchmarks.ifBench6In</label>
            <scores>
                <score>20.910878999604243</score>
            </scores>
        </case>

e.g. the ifBench6In is more than 10x slower - clearly a sign of inlining not happening.

Trying harder with 2188140 - new benchmark run is scheduled and even locally I can see some improvement (0.5ms for ifBench6 and 0.6ms for ifBench6In). However the ifBench6In is still slow on the benchmarking machine: 21.25580650263044...

Explanation

Locally I was using GraalVM EE, however the server is using GraalVM CE. One of the differences between CE and EE is handling of inlining and it shows here. CE gives up and as a result one benchmark round takes 21ms while EE keeps harder and one round then takes less then 1ms.

@@ -42,7 +42,7 @@ protected BuiltinRootNode(EnsoLanguage language) {
* @return new node to use to call this builtin
*/
public DirectCallNode createDirectCallNode() {
return DirectCallNode.create(getCallTarget());
return DirectCallNode.create(cloneUninitialized().getCallTarget());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning each BuiltinRootNode before use - builtins are lightweight and cloning helps to overcome Truffle recursive inlining limit.

Builtins that don't require VirtualFrame in their method signature are explicitly inlined by #6255 (comment) - those that require VirtualFrame are "cloned" before each use. That makes Truffle believe that each call into the builtin is a call the different function.

Such a cloning is not as effective direct inlining, but certainly more effective than giving up on inlining completely due to recursive limit for inlining.

@enso-bot enso-bot bot mentioned this pull request Apr 25, 2023
4 tasks
@JaroslavTulach JaroslavTulach changed the title Investigation of <| inlining Improve inlining of <| on (GraalVM EE) Apr 25, 2023
@JaroslavTulach JaroslavTulach merged commit 8e08a3b into develop Apr 25, 2023
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/CacadeOfFunctions_6269 branch April 25, 2023 10:02
Procrat added a commit that referenced this pull request Apr 25, 2023
* develop:
  Turn null into UnexpectedExpression when Union type is incomplete (#6415)
  Ensure that IO.println does not fail if `to_text` returned a non-Text value. (#6223)
  Improve inlining of `<|` on (GraalVM EE) (#6384)
  Feat: update templates styles and feature (#6071)
  5127 Add Table.parse_to_columns to parse a single column to a set of columns. (#6383)
  URL handling (#6243)
  Exclude comparison operators from modifier logic (#6370)
  Better cleanup of project management test suite (#6392)
  Fix all eslint errors (#6267)
  Infer SQLite types locally (#6381)
  Vector Editor with List Editor and adding elements. (#6363)
  Add typechecks to Aggregate and Cross Tab (#6380)
  Forbid edits in read-only mode (#6342)
  Add Table.parse_text_to_table to convert Text to a Table. (#6294)
  Adding typechecks to Column Operations (#6298)
  Rollback cloud options groups (#6331)
  Project folder not renamed after project name change (#6369)
  Add `replace`, `trim` to Column. Better number parsing. (#6253)
  Read-only mode for controllers (#6259)
  Prevent default for all events, fixing multiple IDE bugs (#6364)
Akirathan pushed a commit that referenced this pull request Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--low-performance CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cascade <| functions is slow
3 participants