Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on todays Nightly build in comparison to 2023.5.9 Nightly build #6630

Closed
1 of 2 tasks
xvcgreg opened this issue May 10, 2023 · 16 comments · Fixed by #6786
Closed
1 of 2 tasks
Assignees
Labels
--bug Type: bug --low-performance --regression Important: regression p-highest Should be completed ASAP

Comments

@xvcgreg
Copy link

xvcgreg commented May 10, 2023

Discord username

No response

What type of issue is this?

Permanent – Occurring repeatably

Is this issue blocking you from using Enso?

  • Yes, I can't use Enso because of this issue.

Is this a regression?

  • Yes, previous version of Enso did not have this issue.

What issue are you facing?

over 2x performance drop on starting the project

Nightly 2023.5.10:

2305101020_shareX.mp4

Nightly 2023.5.9:

2305101025_shareX.mp4

Expected behaviour

Enso performing at least at the level from the previous build.

How we can reproduce it?

Unnamed_5.zip

launch on today and yesterday Nightly builds

Screenshots or screencasts

No response

Enso Version

2023.5.10 nightly

Browser or standalone distribution

Standalone distribution

Browser Version or standalone distribution

standalone

Operating System

Windows

Operating System Version

Win11pro 22H2 22621.1555

Hardware you are using

12th Gen Intel(R) Core(TM) i9-12900HK / RTX3060 Laptop / Nvidia Drivers 531.68

@sylwiabr
Copy link
Member

Let's check what caused this performance drop, revert the commit and fix it

@xvcgreg
Copy link
Author

xvcgreg commented May 12, 2023

I have pinpointed the performance drop of adding 1min to IDE's readiness to work state time to dropdowns. Please observe in the attached video the startup time of IDE (timecode provided) till the state of the graph editor is ready to be tinkered with (I have marked it as an infinite symbol shape movement of mouse panning the camera along).

testing.mp4

please look at parse node and how the "dropdown triangles" appear

  1. You can observe in the top left video (without the dropdown widget) no performance impact
  2. You can observe in the bottom left video the almost instant "two step" appearing behavior of showing those triangles - this also indicates no problems with performance
  3. on right videos you can observe not instant "two step" loading - and on both those videos there's almost additional 1 minute of load time

I have tested:
Nightly 2023.5.9
https://github.com/enso-org/enso/actions/runs/4923717702
https://github.com/enso-org/enso/actions/runs/4924183486
https://github.com/enso-org/enso/actions/runs/4925571865
https://github.com/enso-org/enso/actions/runs/4927876368
https://github.com/enso-org/enso/actions/runs/4928603001
https://github.com/enso-org/enso/actions/runs/4929054323

and
https://github.com/enso-org/enso/actions/runs/4924057089
That one was the only one with consistant performance (w/o this additional minute) - all of the previous builds have some inconsistencies regarding this "two step" effect:

2023.5.9 Nightly - 2:20 Xe
ide-windows 4923717702 - 2:33 Xe
ide-windows 4924183486 - 2:58 Xe
ide-windows 4925571865 - 2:41 Xe
ide-windows 4927876368 - 2:36 Xe
ide-windows 4928603001 - 2:30 Xe
ide-windows 4929054323 - 2:28 Xe
ide-windows 4924057089 - 1:10 GPU

--restart

2023.5.9 Nightly - 1:25 GPU
ide-windows 4923717702 - 2:10 GPU
ide-windows 4924183486 - 1:06 GPU
ide-windows 4925571865 - 1:56 GPU
ide-windows 4927876368 - 2:07 GPU
ide-windows 4928603001 - 2:08 GPU
ide-windows 4929054323 - 2:07 GPU
ide-windows 4924057089 - 1:08 GPU (dropdowns not loaded))


--3rd run
2023.5.9 Nightly - 1:58 GPU
2023.5.9 Nightly - 2:06 GPU
ide-windows 4924183486 - 2:02 GPU
ide-windows 4924057089 - 1:06 GPU (dropdowns not loaded)
ide-windows 4924183486 - 2:00 GPU

--restart

ide-windows 4924183486 - 1:06 GPU (dropdowns not loaded)
ide-windows 4924183486 - 2:07 GPU (dropdowns gradually)
ide-windows 4924183486 - 2:02 GPU (dropdowns gradually)
ide-windows 4924057089 - 1:05 GPU (dropdowns not loaded))```

@Frizi
Copy link
Contributor

Frizi commented May 15, 2023

I have found that the suggestion database is absolutely stormed with updates on this test project. There are about 3000 individual updates sent basically all at the same time. The IDE handles that very poorly, as each of those updates ends up triggering a view update of the graph editor, which in turn creates and compares span-trees for each node.

image

I still don't understand the exact cause of the performance degradation between those versions. Both are unacceptably slow on this project anyway. I guess just showing a few more dropdowns is enough to make that much difference, especially when there are so many updates coming. The IDE should be able to handle that load, so it is the main focus here. But so many update messages being sent is also worrying and it should be investigated. Batching and deduplicating those updates on engine side might improve this situation significantly as well.

@hubertp
Copy link
Collaborator

hubertp commented May 15, 2023

I'm getting worried this is starting to mix a few things - a nightly from 04/28 would start showing nodes much quicker than 05/09 or 05/15. But based on the conversation in #6626 (comment) it seems to be intentional. I find it confusing though.

Regarding a large number of suggestions' updates - it has been like that for a while. We should still fix it but it feels like a separate issue.

@wdanilo
Copy link
Member

wdanilo commented May 15, 2023

@hubertp why showing nodes later would be intentional? I don't see any mention of that in the conversation you linked. We should show nodes to the user as soon as possible. Any delay in that is bad, so I doubt anything slowing that down was ever intentional.

Regarding these thousands of updates from the engine - even if it was like that for some time, this definitely affects our performance, not only here, but in other cases. This is probably even more visible on weaker hardware. We've been chatting about it during todays weekly meeting, and if I understand correctly, there is a plan to investigate and fix it, right? Is there an issue we can track? If there is, could you please link it (or create it otherwise)?

@hubertp
Copy link
Collaborator

hubertp commented May 16, 2023

@hubertp why showing nodes later would be intentional?

I said quite the opposite.

@farmaazon
Copy link
Contributor

@hubertp why showing nodes later would be intentional?

I said quite the opposite.

TBH I also don't see any mention of "delaying showing nodes" in the discussion you've linked. The comment the link leads to mentions only "improving startup performance".

@hubertp
Copy link
Collaborator

hubertp commented May 16, 2023

@hubertp why showing nodes later would be intentional?

I said quite the opposite.

TBH I also don't see any mention of "delaying showing nodes" in the discussion you've linked. The comment the link leads to mentions only "improving startup performance".

That ticket was linked as related to the performance downgrade in GUI. And it was closed as a non-issue. That's why I was puzzled because I understood the ticket as "this is how nodes will be displayed now, the problem is somewhere else".

Anyway, like I mentioned before, the flooding of messages is not something new. It's been like that for some time now. The degradation in GUI, even if caused by it, is. Maybe it wasn't perfect but it also wasn't so bad until around a week ago, as illustrated by @xvcgreg's videos. I separated the LS work into a new ticket. Just trying to avoid a situation where the whole fault is put on LS, that's all.

@hubertp
Copy link
Collaborator

hubertp commented May 16, 2023

Also, just to maybe double down on those videos - I used a single, latest, engine and different GUI versions going back 2 weeks back. That's when I saw a similar (subjective) slowdown.

@wdanilo
Copy link
Member

wdanilo commented May 16, 2023

@hubertp why showing nodes later would be intentional?

I said quite the opposite.

TBH I also don't see any mention of "delaying showing nodes" in the discussion you've linked. The comment the link leads to mentions only "improving startup performance".

That ticket was linked as related to the performance downgrade in GUI. And it was closed as a non-issue. That's why I was puzzled because I understood the ticket as "this is how nodes will be displayed now, the problem is somewhere else".

Anyway, like I mentioned before, the flooding of messages is not something new. It's been like that for some time now. The degradation in GUI, even if caused by it, is. Maybe it wasn't perfect but it also wasn't so bad until around a week ago, as illustrated by @xvcgreg's videos. I separated the LS work into a new ticket. Just trying to avoid a situation where the whole fault is put on LS, that's all.

The ticked you linked describes that Greg was seeing scaling issues of Enso. It was closed because the scaling artifacts are not solvable now. The performance downgrade issue was extracted to a separate issue (this one). I see no information there that the worse performance is expected nor I have seen there information that "this is how nodes will be displayed now, the problem is somewhere else" - in fact, I don't understand it even more now. What do you mean by "this is how nodes will be displayed"? The ticket does not refer to how nodes are displayed, it describes Enso window scaling issues.

Regarding the slowdown, we need to fix it.

I said quite the opposite.

You said a nightly from 04/28 would start showing nodes much quicker than 05/09 or 05/15. But based on the conversation in https://github.com/enso-org/enso/issues/6626#issuecomment-1547598732 it seems to be intentional. - so, an older version was showing nodes faster than newer version, and it seemed intentional to you. I don't see how I could interpret this sentence other than this.

Also, @hubertp, can you please reply to all the questions from my previous message? I was asking about plan/ticket for fixing the big amount of messages from engine.

@Akirathan
Copy link
Member

Also, @hubertp, can you please reply to all the questions from my previous message? I was asking about plan/ticket for fixing the big amount of messages from engine.

The ticket is #6706. Hubert is working on that right now. Nevertheless, this is an LS-only ticket. Moreover, there were always a huge number of updateSuggestionDatabase messages sent from LS, so we don't see how this is related to the recent sudden performance regression (x2 project startup time). Therefore, we think that there may be some IDE-related changes that affect project startup time.

The commits between nightly 5.9. and nightly 5.10 are here. With @Frizi, we were unable to discover which of these commits might affect the startup performance that much. Moreover, there is no commit touching LS or even any Engine parts whatsoever.

The #6626 was closed as a non-issue mostly because the scaling is, AFAIK, handled by the browser and there is nothing much we can do to make scaling responsive while the main thread is blocked.

@hubertp
Copy link
Collaborator

hubertp commented May 16, 2023

You said a nightly from 04/28 would start showing nodes much quicker than 05/09 or 05/15. But based on the conversation in https://github.com/enso-org/enso/issues/6626#issuecomment-1547598732 it seems to be intentional. - so, an older version was showing nodes faster than newer version, and it seemed intentional to you. I don't see how I could interpret this sentence other than this.

But that's not what I'm saying. I'm saying there is a slowdown. A linked #6626 mentioned a, what appeared to be a related problem, and the ticket was closed. I interpreted the "Closing this as a non-issue." as a "this is how nodes will be displayed".

I think we can both agree that I would never say that performance degradation is intentional or desired.

Also, @hubertp, can you please reply to all the questions from my previous message? I was asking about plan/ticket for fixing the big amount of messages from engine.

I did, see the 2nd link below your original message.

@wdanilo
Copy link
Member

wdanilo commented May 16, 2023

I think we can both agree that I would never say that performance degradation is intentional or desired.

Oh, absolutely :D But I understood, that you understood, that displaying nodes later is intentional (maybe because of some additional work on GUI side). I'm happy though we are both on the same side - we should display nodes to people as soon as possible :)

I did, see the 2nd link below your original message.

I totally missed it. I replied there seeing that in my GH notifications in the meantime, but I totally missed it was linked in-between discussions. Thanks!

@Akirathan Akirathan removed the triage label May 17, 2023
@Frizi Frizi mentioned this issue May 18, 2023
2 tasks
@hubertp
Copy link
Collaborator

hubertp commented May 18, 2023

The problem with suggestions, as described in #6706, is fixed. That will rather drastically improve the performance of this project since we now reduced a number of suggestions updates from ~4.5k to ~100.

Having said that, the problem with a large number of suggestions has been introduced in 09/22(!). And the problem described in this ticket refers to a regression from 2 weeks ago. I continue to claim that the noticeable performance regression happened in GUI, not LS. Not sure if anyone on the GUI side will continue investigation of this ticket though.

Also, GUI has a problem with a large number of updates coming in a short period of time, as demonstrated by this ticket. But as @wdanilo mentioned in #6706 that should likely be fixed on the GUI-side by doing some batching. We can certainly come up with some synthetic projects with hundreds of nodes that illustrate the problem. @wdanilo should we have a separate ticket for that as well? I know very little of the underlying processing problem so I'm not sure how to describe it.

@farmaazon
Copy link
Contributor

I propose to just keep this task open and assigned to someone from GUI (Performance) team.

@sylwiabr
Copy link
Member

Actually @kazcw is already assigned and it's already scheduled for investigation.

@farmaazon farmaazon moved this from ❓New to 📤 Backlog in Issues Board May 19, 2023
@kazcw kazcw moved this from 📤 Backlog to 👁️ Code review in Issues Board May 19, 2023
@sylwiabr sylwiabr added this to the Design Partners milestone May 22, 2023
@mergify mergify bot closed this as completed in #6786 May 29, 2023
mergify bot pushed a commit that referenced this issue May 29, 2023
Only invalidate the graph editor view at most once per frame. On develop, this saves about 70ms (2%). Testing a recent backend without #6755 as a stress-test, this saves about 5s (45%). This reflects better scalability to large numbers of `SuggestionUpdate` messages.

Fixes #6630.

# Important Notes
- Also fix intermittent profiling failures occurring since the introduction of microtasks.
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board May 29, 2023
@farmaazon farmaazon moved this from 🟢 Accepted to 🗄️ Archived in Issues Board May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--bug Type: bug --low-performance --regression Important: regression p-highest Should be completed ASAP
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants