Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop in 3.3 RC 7 due to change in shadow cubemap size to match resolution #48036

Closed
cybereality opened this issue Apr 20, 2021 · 23 comments

Comments

@cybereality
Copy link
Contributor

Godot version:

v3.3 RC9

OS/device including version:

Windows 10, RTX 2080 Ti, Driver 466.11, GLES3

Issue description:

In several small test projects, the performance has dropped significantly. Typically I would get around 160 fps with V-Sync, now it has dropped to around 60 - 70 fps (and this is with test apps with only a few cubes/planes, not even a full game). I have went back and tested the same projects in v3.2.4 Beta6 and performance is fine. Note, I have a few other Godot projects where performance is still great, so I don't think it's my machine. But something there is causing a performance issue.

Steps to reproduce:
I've attached a test project (on Google Drive, too big to upload here). Try opening it in the latest RC, and see if you can reproduce my issue. This is not a demanding scene, just a few simple models.

Minimal reproduction project:

https://drive.google.com/file/d/1hAv2FRAHCVT3p25Upduhm_auqcB5vudT/view?usp=sharing

@lawnjelly
Copy link
Member

lawnjelly commented Apr 20, 2021

I get 36fps in beta6 and 21fps in RC9. It's not the 2D (I tried even turning off the fps label, and no change).

Also RC3 gets 36fps, so it is fairly recent.
RC4 has 26fps. So the bulk of the change is between RC3 and RC4.
RC6 has 26fps.

@akien-mga
Copy link
Member

Note that there's a baked lightmap in the MRP, so it needs to be rebaked when switching versions (at least between 3.2.3 and 3.2.4 beta 6+ which added the new CPU lightmapper).

My tests results on Linux with:

System specs
$ inxi -CSG
System:    Host: cauldron Kernel: 5.11.15-desktop-1.mga9 x86_64 bits: 64 Desktop: KDE Plasma 5.21.4 Distro: Mageia 9 mga9 
CPU:       Info: Quad Core model: Intel Core i7-8705G bits: 64 type: MT MCP cache: L2: 8 MiB 
           Speed: 3101 MHz min/max: 800/3100 MHz Core speeds (MHz): 1: 3101 2: 3100 3: 3101 4: 3100 5: 3100 6: 3100 7: 2674 
           8: 3100 
Graphics:  Device-1: Intel HD Graphics 630 driver: i915 v: kernel 
           Device-2: Advanced Micro Devices [AMD/ATI] Polaris 22 XL [Radeon RX Vega M GL] driver: amdgpu v: kernel 
           Device-3: Cheng Uei Precision Industry (Foxlink) HP Wide Vision FHD Camera type: USB driver: uvcvideo 
           Display: x11 server: Mageia X.org 1.20.11 driver: loaded: intel,v4l resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 21.0.2
GPU 3.2.3-stable 3.2.4 beta 1 3.2.4 RC 3 3.2.4 RC 4 3.3 RC 6 3.3 RC 7 3.3 RC 9
Intel HD 630 47 FPS 47 FPS 46 FPS 45 FPS 45 FPS 38 FPS 38 FPS
Radeon RX Vega M 160 FPS 160 FPS 155 FPS 155 FPS 155 FPS 137 FPS 137 FPS

So I can't reproduce @lawnjelly's performance drop between RC 3 and RC 4. The main difference between the two is that RC 4 spams errors because the MRP is using invalid input mappings. This could cause a FPS drop (in debug builds only) when spamming the stdout (especially on Windows):

ERROR: get_action_strength: Request for nonexistent InputMap action 'look_right'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_left'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_down'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_up'.
   At: main/input_default.cpp:142.

I do reproduce a performance drop between 3.3 RC 6 and RC 7 however, on both Intel and Radeon on Linux.

And I seem to have a tiny performance drop somewhere between 3.2.4 beta 1 and 3.2.4 RC 4, but not sure it's significant enough to be worth looking into.

@lawnjelly
Copy link
Member

lawnjelly commented Apr 20, 2021

I've tracked down the major change I think between RC3 and RC4. In RC4 the error message is spamming about using an input map that isn't set, and in RC3 that error message isn't there.

Oo beaten to it!

When I remove those inputs, the fps is the same in RC3 and RC4.

I agree it seems possibly related to the big changes in lightmapping and the project use of lightmap, and if this is happening between RC6 and RC7 this does match in the timeline.

@akien-mga
Copy link
Member

@cybereality Could you try to fix the InputMap error locally and see if you still get a performance drop?

If so, can you try intermediate RCs to see if you can pinpoint which one(s) cause a performance drop? From my test it seems to be between RC 6 and RC 7 but maybe it would be different for your hardware.

@lawnjelly Can you reproduce the drop between RC 6 and RC 7?

Relevant changes to look into: 15ff752...cca2637

Possible "culprit": 8d156d9

@cybereality
Copy link
Contributor Author

cybereality commented Apr 20, 2021

Ah, noob mistake. I didn't look at the console window (it was hidden behind the editor). After adding those input mappings, now performance is back to what it was before (60fps to 400fps). So it was definitely the standard out that was slowing it down.

@lawnjelly
Copy link
Member

Yes I get from 79fps (RC6) -> 55fps (RC7). I decreased the size of the window a bit so I could see the output, that's why it is faster than the previous figures.

@akien-mga
Copy link
Member

akien-mga commented Apr 20, 2021

Yes I get from 79fps (RC6) -> 55fps (RC7). I decreased the size of the window a bit so I could see the output, that's why it is faster than the previous figures.

...

Possible "culprit": 8d156d9

Confirmed that this commit explains the slight performance drop experienced on my hardware.

To my untrained eyes the shadow quality in 3.3 RC 7 might be slightly better indeed, but I'm not sure it's significant enough to warrant having default settings which lead to a performance reduction. CC @clayjohn @puchik

If I read this correctly with this commit we're going from a cubemap sampler size of 512 to 4096, i.e. 8 times bigger (or is that 8²?). Maybe we should reduce the default values?

However if I set rendering/quality/shadow_atlas/size to 512 then I do get awful looking shadows.
So maybe the cubemap sample size should get its own config value that could default to e.g. 512 instead of reusing the value of rendering/quality/shadow_atlas/size?

@akien-mga akien-mga changed the title Serious Performance Regression Performance drop in 3.3 RC 7 due to change in shadow cubemap size to match resolution Apr 20, 2021
@akien-mga
Copy link
Member

I retitled the issue to track the new issue that was identified with this MRP (regression from #47160 in 3.3 RC 7), since the other one was just the consequence of error spam.

@akien-mga
Copy link
Member

akien-mga commented Apr 20, 2021

For comparison:

3.3 RC 6, default settings (shadow atlas size 4096, cubemap sampler size 512):
Screenshot_20210420_094452

3.3 RC 7, default settings (shadow atlas size 4096, cubemap sampler size 4096):
Screenshot_20210420_094622

3.3 RC 7, shadow atlas size and cubemap sampler size 2048:
Screenshot_20210420_161820

3.3 RC 7, shadow atlas size and cubemap sampler size 1024:
Screenshot_20210420_162008

3.3 RC 7, shadow atlas size and cubemap smapler size 512:
Screenshot_20210420_094721

So lowering the common setting to get back the 3.3 RC 6 performance is definitely not an option.

@Zireael07
Copy link
Contributor

How does 1024 look?

@clayjohn
Copy link
Member

I'm a little surprised that #47160 is causing such a performance regression. In my testing the difference was about 0.1 to 0.5 ms when using 32 dynamic lights with shadows.

That being said, we should consider reducing the default shadow atlas size to 2048 now. The quality is better than 4096 without #47160 and speed should be improved.

See also for further discussion. puchik#1

@akien-mga
Copy link
Member

akien-mga commented Apr 20, 2021

I updated #48036 (comment) to include screenshots using 3.3 RC 7 with shadow atlas size 2048 and 1024.

The quality on this MRP does suffer quite a few from reducing to 2048 if you compare to 3.3 RC 6:

RC6: Shadow atlas 4096, cubemap sampler 512:
Screenshot_20210420_162207

RC7: Shadow atlas and cubemap sampler 2048:
Screenshot_20210420_162223

@clayjohn
Copy link
Member

@akien-mga thanks. That does indeed look like an unacceptable tradeoff.

I don't want want to revert #47160 entirely as it makes it possible to have much better shadow quality (at the expense of speed) than was possible in 3.2.

I will make a PR tonight that introduces a project setting for max_shadow_cubemap_sampler_size:

int max_shadow_cubemap_sampler_size = CLAMP(int(next_power_of_2(GLOBAL_GET("rendering/quality/shadow_atlas/size")) >> 1), 256, storage->config.max_cubemap_texture_size);

We can default to 512 so that the quality/performance is the same in 3.3 as it was in 3.2 and then users who want more quality can increase it as they need.

@Calinou
Copy link
Member

Calinou commented Apr 20, 2021

That being said, we should consider reducing the default shadow atlas size to 2048 now. The quality is better than 4096 without #47160 and speed should be improved.

Sharper shadows aren't always better. Using a lower resolution introduces blurriness, but shadows in real life aren't perfectly sharp, especially when the shadow is far away from its caster.

In fact, I tend to find the default OmniLight/SpotLight shadows too sharp right now (when used at the typical OmniLight/SpotLight sizes). This is especially the case when you compare them to the default DirectionalLight shadow quality, which makes them look inconsistent.

Anyway, I think @clayjohn's solution is the way to go for 3.3.

@Cykyrios
Copy link
Contributor

I can confirm as well that #47160 caused a significant framerate drop on my project, which made it to go from roughly 50-ish fps (4 instances as I test multiplayer) to literally single digit fps (on a GTX 1060). I have however not touched any shadow settings, so if decreasing the default size results in roughly the same performance and quality as 3.2, this is probably fine.

Below is a quick comparison between the latest 3.2 commit vs 8d156d9, no project settings changed:

mahjong_3.2.mp4
mahjong_3.3.mp4

@akien-mga
Copy link
Member

That's a very massive performance hit, that sounds surprising compared to what we were discussing above. Could you test 3.3 RC 6 and RC 7 to confirm that you get good performance in RC 6 and the drop in RC 7?

@Cykyrios
Copy link
Contributor

Cykyrios commented Apr 20, 2021

I just tried with RC6 and RC7, I can confirm I get the same performance drop on RC7 only. For what it's worth, I notice some really high CPU usage on a single thread with RC7 (I do have quite a few Tweens at work), which doesn't happen with RC6. My system specs below just in case:

OS: Linux Mint 20.1 (kernel is 5.4.0)
CPU: Intel i7-8700K
GPU: Nvidia GTX 1060 6GB
RAM: 16 GB, most likely irrelevant here

Below is a screenshot of the framerate monitor in RC7 (the two dips to ~15 fps correspond to the animations). It doesn't seem to go down quite as low as what the above video shows, probably because of averaged values. In RC6, framerate drops to 30-40 during the animations.
3.3 RC7 FPS Monitor

My issue may be a somewhat extreme case where high CPU usage causes framerate to drop more than it should, but RC7 definitely did impact framerate.

@puchik
Copy link
Contributor

puchik commented Apr 20, 2021

We can default to 512 so that the quality/performance is the same in 3.3 as it was in 3.2 and then users who want more quality can increase it as they need.

This is the way to go. It's possible the large frame drops are a combination of things but the shadow resolution would definitely contribute to it. For my project having the cubemap capped to 512 looked terrible (indoor environment with low lighting) and having it increased caused minimal drops.

@clayjohn
Copy link
Member

clayjohn commented Apr 21, 2021

Funny enough, running on my hardware I can't reproduce a performance drop between RC 6 and RC 7. In both cases performance is highly variable for the first 10 seconds or so and then plateaus at about 125 FPS. Note, I am not re-baking the lightmap between runs which shouldn't matter if 8d156d9 is indeed the culprit.

On desktop Windows 10 AMD RX 5600 XT

RC6
Screenshot (139)

RC7
Screenshot (140)

I am going to test on my laptop using integrated graphics to see if I can reproduce there.

On laptop (intel integrated) Windows 10 UHD Graphics 620

RC6
Screenshot (1)

RC7
Screenshot (10)

I am seeing a performance drop on my laptop with intel integrated graphics. PR incoming with the changes described above.

@akien-mga
Copy link
Member

Fixed by #48059.

@Cykyrios I'm not sure yet if your performance regression was exactly the same as the one fixed by #48059, or if it's that one + something else that also happened in RC 7. I'll make you a Linux build quickly to test.

@lawnjelly
Copy link
Member

It could be that the shadow map resolution problem is affecting @Cykyrios much more, because he is using split screen. 4 screens would seem to be 4x as many shadow map renders, for the same final screen area. That said there still could be other changes in RC7.

@akien-mga
Copy link
Member

@Cykyrios I'm not sure yet if your performance regression was exactly the same as the one fixed by #48059, or if it's that one + something else that also happened in RC 7. I'll make you a Linux build quickly to test.

Could you test this build? It contains #48059 so if you use default shadow map settings, it should behave the same as 3.2.3 / 3.3 RC 6.

https://downloads.tuxfamily.org/godotengine/testing/Godot_v3.3-rc.d91b780_x11.64.zip

@Cykyrios
Copy link
Contributor

Cykyrios commented Apr 21, 2021

@akien-mga This build does have the same performance as RC6 for me, and I also compiled from the latest commit just in case, same result.
Thanks a lot to you and @clayjohn!

Edit: Actually I just increased the cubemap size from the default 512 to 2048 (shadow atlas size was 8192 already) and I still get the same performance as RC6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants