Investigate performance of a + b #4352

wdanilo · 2023-02-05T22:10:07Z

This task is automatically imported from the old Task Issue Board and it was originally created by jaroslavtulach.
Original issue is here.

Why

As an Enso user
I want as quick updates of the UI as possible
So that changing a value feels instant.

Acceptance Criteria

main = 
  number1 = 4
  number2 = 8
  sum1 = 5 + number2

Scenario: 
Given the previous three nodes program 
When second parameter of + is changed from number1 to number2
Then the engine should respond in less than 16ms.

Notes:

Tasks:

Investigate where the time is currently spend

Comments:

Switching the input of a second parameter to + shall be fast. (jaroslavtulach - May 19, 2022)

To gather insight into the language server engine runtime I have downloaded `ide-macos` from [May 5, 2022](https://github.com/enso-org/ci-build/actions/runs/2274643280) and installed the `.dmg` into my system. Then I could launch the IDE with following parameters: ``` ENSO_JVM_OPTS=-Dcom.sun.management.jmxremote.authenticate=false\ -Dcom.sun.management.jmxremote.ssl=false\ -Dcom.sun.management.jmxremote.port=9090 /Applications/Enso.app/Contents/MacOS/Enso ``` After setting the project up I connected to its engine to record what it does: ``` /Applications/Enso.app/Contents/Resources/enso/runtime/Home/bin/java -jar /Applications/Enso.app/Contents/Resources/enso/runtime/Home/lib/visualvm/platform/modules/org-netbeans-modules-sampler.jar 9090 sum.npss ``` I am attaching two `.npss` files (to be opened in VisualVM: `/Applications/Enso.app/Contents/Resources/enso/runtime/Home/bin/jvisualvm`). The `sum-many.npss` is representing recording of more than five _input switches_. Each of them identified by higher _stack depth_. The `sum-one.npss` contains recording of just a single _input switch_. Hint: select the _time interval_ around the _"event"_ to see summary of what was happening just in that _"moment"_. (jaroslavtulach - May 19, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-19):

Progress: Verifying our work on startup - issues of last week are gone, but: too much suggestions send too early. Thoughts and discussion about dynamic drop downs. Measuring a+b modifications. It should be finished by 2022-05-27.

Next Day: Fully focus on speeding up a+b modification. (Enso Bot - May 20, 2022)

I've extracted 67.8ms interval representing one modification of the program from the above `.npss` files. What I see: - 26ms in `Compiler.parse` - 20.7 ms in `GatherDiagnostics$.gatherMetadata` calling `distinctBy` operation - 11ms in `ChangesetBuilder$$Lambda$5442.0x000000084181e840.get$Lambda` - 9.76ms executing the user program - e.g. `ExpressionNodeWrapper.executeGeneric`

The data may not be completely accurate - they are based on eleven snapshots only - e.g. one per 5ms. There is some work in other threads as well - not sure whether it blocks the main computation yet:

32ms in slick.basic.BasicBackend$DatabaseDef
7,76ms in org.enso.searcher.sql.SqlSuggestionsRepo
7ms in akka.stream.impl.io.TcpConnectionStage$TcpStreamLogic

(jaroslavtulach - May 20, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-20):

Progress: Analyzing a+b modifications. Bugfixing sampler: apache/netbeans#4134 It should be finished by 2022-05-27.

Next Day: Fully focus on speeding up a+b modification. (Enso Bot - May 21, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-21):

Progress: Analysis of a+b modifications found concatenation of strings - being removed in #3465; discussion with up-stream project whether to merge a sampler improvement now or later: apache/netbeans#4134 It should be finished by 2022-05-27.

Next Day: Focus on speeding up a+b modification. (Enso Bot - May 22, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-23):

Progress: Analysis of a+b modifications via profiling interface #3458; re-measuring; reporting issues into tracker; meetings&discussions - also about Vector vs. Array; Upstream PR apache/netbeans#4134 got merged and we shall have better sampler.jar with NetBeans 14 It should be finished by 2022-05-27.

Next Day: Focus on speeding up a+b modification. (Enso Bot - May 24, 2022)

Commit by Jaroslav Tulach https://github.com/enso-org/enso/commit/2973cd2db2c9d0dc85bba39754d3b2f76a826f86 enso

Avoid needless concatenations of warning/error messages (#3465)

While investigating #182234656 I've noticed that GatherDiagnostics calls distinctBy and uses message string. However such string has to be composed and that takes time:

This PR eliminates the need for concatenation by associating each Diagnostic with an array of keys used to compute equals and hashCode. (jaroslavtulach - May 24, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-24):

Progress: Analyzed what it would take to support Mac M1/GraalVM 22.1 https://www.pivotaltracker.com/n/projects/2539304/stories/182269744 per request of Enso community. Discussion about sbt input/output files. Analyzing a+b modification from a perspective of IGV graphs. Meetings. It should be finished by 2022-05-27.

Next Day: Meetings & focus on analyzing up a+b modification. (Enso Bot - May 25, 2022)

Turns out, after analyzing the protocol communication between IDE and engine, that the IDE represents the single "drag and drop" operation as two separate _edit messages_. That is inefficient as it _duplicates_ the amount of work on the engine side. The attached picture shows the messages being send/received. There are two `text/applyEdit` messages which then result in multiple (partially duplicated) replies from the engine side. I'd say that the time between the first and second sent `text/applyEdit` message (9ms in this measurement) is basically wasted. Sending `executionStatus` twice is not necessary either. `suggestionDatabaseUpdates` could also be transferred just once.

Clearly it would be more effective if the code modification was sent in a single text/applyEdit message. Whether or not this deserves to be addressed remains to be seen. However it is clear that this use-case doesn't properly simulate sliding a single value. If we want to analyze slider behavior, we may need to seek for a different way to mimic that.

Update, May 27: Debugging the language server engine runtime gave me additional insight into the two text/applyEdit messages case. The messages are accepted and processed asynchronously. The receiving thread puts them into a pending edits queue and the processing thread reads them later and handles them. The behavior depends on thread scheduling. If the processing thread is fast, it may get to processing of the pending edits sooner than second edit arrives - the whole work (parsing, compiling, executing) is duplicated. If the processing thread is slower and the second edit is put into the pending ones meanwhile, it processes both edits at once and performs the parsing, compiling and execution only once.

When debugging, I was able to simulate both scenarios. While running without debugger I only saw processing of both pending edits only at once. (jaroslavtulach - May 25, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-25):

Progress: Why there are two applyEdits? That's not a slider change! Reporting "too many threads" problem https://www.pivotaltracker.com/n/projects/2539304/stories/182283991; Meetings; Reporting "polyglot Vector" story: https://www.pivotaltracker.com/n/projects/2539304/stories/182286277 It should be finished by 2022-05-27.

Next Day: Find new slider-like example for analyzing up a+b modification. (Enso Bot - May 26, 2022)

How much time has been saved by implementing [PR-3465](https://github.com//pull/3465)? We are missing an objective, automatic system for benchmarking these actions. Moreover the manual measurement is subject to high variance. Anyway, there is a message in the log: ``` Job EnsureCompiledJob finished in xyz ms ``` and we can use that one as an indicator of the time spent by engine compiling and processing the result.

If I repeatedly change the input of the a + b expression on my Mac and observe the smallest value, I can see some improvement. Without the PR-3465 fix the Job EnsureCompiledJob finished value could get down to 13ms. With the PR-3465 fix, I am able to get to 11ms. E.g. the speedup is somewhere between 10-20%. That's in correlation with https://user-images.githubusercontent.com/26887752/169638644-cf7dd826-1016-402b-842a-fc27a77a64e1.png which claims 15% of time has been taken by the "needless concatenation of messages". (jaroslavtulach - May 26, 2022)

**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-26):

Progress: Confirming 10-20% speedup: https://www.pivotaltracker.com/story/show/182234656/comments/231464219; Meeting about performance. Measuring "change node value" scenario. Modifying sampler to use 5ms rate. It should be finished by 2022-05-27.

Next Day: Conclude the a+b modification measurements. (Enso Bot - May 27, 2022)

To avoid the pitfalls of double `text/applyEdit` messages I have switched to `2 + a` program where I change the value of node `a` from `5` to `7` and back and measure what happens. The `5ms-rate.npss` has been recorded with 5ms sampling rate. E.g. it contains twice as much raw samples. The important section is selected in `5ms-action.nps` - both files can be opened with `graalvm/bin/jvisualvm` and analyzed manually.

The snapshot clearly identifies that majority of work (ten snapshots - e.g. 50.3ms) is being spend in job-pool1. It is divided as:

44.5ms in org.enso.interpreter.instrument.job.EnsureCompiledJob.ensureCompiledFiles() compiling
- 27.6ms spend in parsing - e.g. org.enso.compiler.Compiler.parse
- one snapshot (~5.42ms) shows org.enso.compiler.Compiler.runMethodBodyPasses - processing the body data
- one snapshot (~6.41ms) captures org.enso.interpreter.instrument.job.EnsureCompiledJob.sendModuleUpdate and conversions to sendBinary
- one snapshot (~5.5ms) captures org.enso.interpreter.runtime.Context.getModuleForFile()
5.80ms (e.g. one snapshot) shows org.enso.interpreter.service.ExecutionService.execute() running the new program

Prior that there is also work in logging-service-client-akka.actor.default-dispatcher-5 thread showing receiving of the message (two snapshots). After that there is work in searcher.db-1 thread updating an SQL database.

Update on Saturday May 28: I've added System.currentTimeMiliis at the suspected methods and got following ms for multiple invocations of the methods:

EnsureCompiledJob.ensureCompiledFiles,55,55,29,42,37,35,40,40,36
Compiler.parse,34,37,18,25,22,19,24,24,23
Compiler.runMethodBodyPasses,2,3,3,3,1,2,1,2,1,1,2
EnsureCompiledJob.sendModuleUpdate,2,1,0,1,1,2,1,0,0
Context.getModuleForFile,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0
EnsureCompiledJob.applyEdits,2,2,1,3,3,2,2,3,2
ExecutionService.execute,6,8,3,10,5,5

These measurements confirm that EnsureCompiledJob.ensureCompiledFiles, Compiler.parse, ExecutionService.execute are performing significant work. On the other hand the measurements seems to indicate EnsureCompiledJob.sendModuleUpdate, Context.getModuleForFile aren't real bottlenecks. EnsureCompiledJob.applyEdits might deserve some interest.
(jaroslavtulach - May 27, 2022)

Just FYI: I've got [perf-map-agent](https://github.com/jvm-profiling-tools/perf-map-agent) working and convinced it to produce a flamegraph. The steps to reproduce the same are:

build perf-map-agent project - e.g. cmake . and make
launch the project manager with -XX:+PreserveFramePointer option
let project manager start a language server engine runtime JVM
find out its pid (for example via jps -v)
connect to it: ./perf-map-agent/bin/perf-java-flames <pid>

This is the command to use to start the project manager JVM.

ENSO_JVM_OPTS=-XX:+PreserveFramePointer ./built-distribution/enso-project-manager-0.0.0-dev-linux-amd64/enso/bin/project-manager -vv --no-log-masking
``` (jaroslavtulach - May 27, 2022)
<hr />
**Jaroslav Tulach** reports a new **STANDUP** for yesterday (2022-05-27):

**Progress:** Presenting results of `a+b` measurements. Summary: _the less parsing/AST/Truffle tree rebuilding - the better_. Trying new ways of measurements. Unsuccessful with Oracle Developer Studio Performance Analyzer. Successful with Perf Map Agent. Got a decent flamegraph: https://www.pivotaltracker.com/story/show/182234656/comments/231492901 It should be finished by 2022-05-27.

**Next Day:** I still need to finish _checking the "current time millis" of the "node editing"_ to eliminate the "sampling" effect. (Enso Bot - May 28, 2022)
<hr />
Let's close the endeavors of this story for now. During the course of this task we have identified and optimized [unnecessary string concatation](https://github.com/enso-org/enso/pull/3465) giving us 10-20% speedup; realized that often the IDE sends two `text/applyEdits` - luckily the engine usually merges these events and processes them at once. 

We have developed various ways to analyze the performance: be it sampling using recently added `project-manager --profiling-path <file>` option; external sampling via `ENSO_JVM_OPTS`, generating flamegraphs using cross-language `perf` tool or verifying the results via old good `currentTimeMillis` & `println` statements.

Let's close the investigation until new parser is out (reported #182323525 for tracking purposes). Meanwhile investigate what API to create so the IDE can tell "hey, this value on node changed, update the program" to make sure the _amount of work is proportional_ to user change (tracked as #182323784). (jaroslavtulach - May 28, 2022)
<hr />

The text was updated successfully, but these errors were encountered:

wdanilo closed this as completed Feb 5, 2023

wdanilo assigned JaroslavTulach Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate performance of a + b #4352

Investigate performance of a + b #4352

wdanilo commented Feb 5, 2023