-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Vulkan CPU/GPU synchronization #80566
Conversation
I need to test this sometime during the week with VR. I've had a lot of timing issues because our frame timing seemed to clash with OpenXRs |
OK I have evidence that lag has been reduced. I couldn't use OBS because it introduced atrocious lag when enabled. I had to use an external camera. I've ran multiple non-scientific tests and it felt better but I was heavily biased (I want to claim my version does something); so I had to do it with a more precise, scientific method. Methodology
while true
do
xte 'mousemove 100 100'
xte 'mousemove 100 125'
xte 'mousemove 100 150'
xte 'mousemove 100 175'
xte 'mousemove 100 200'
xte 'mousemove 100 225'
xte 'mousemove 100 250'
xte 'mousemove 100 275'
xte 'mousemove 100 300'
xte 'mousemove 100 325'
xte 'mousemove 100 350'
xte 'mousemove 100 325'
xte 'mousemove 100 300'
xte 'mousemove 100 275'
xte 'mousemove 100 250'
xte 'mousemove 100 225'
xte 'mousemove 100 200'
xte 'mousemove 100 175'
xte 'mousemove 100 150'
xte 'mousemove 100 125'
xte 'mousemove 100 100'
done These are the results (I've had to re-encode the videos because they were too big for Github): Before PR After PR We can notice:
Overall I believe the data is consistent with latency being reduced by 1 frame (16 ms). I encourage others to do their own research and validate the result. Unfortunately I only run the test once (I'm a bit of a hurry, tomorrow it's election day). IMHO it feels noticeable snappier too. Of course it will never reach the level of snappiness you can get with VSync Off, however theoretically the latency with VSync off should be reduced as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial code review. Overall, I think this change is fantastic. But we indeed need to get as much testing as possible
I tested locally on PopOS 22.04 with Intel integrated graphics, X11 WM. I didn't see any perceptible changes in performance/lag/stability. Although my experience was already quite good.
@@ -8933,11 +8968,8 @@ void RenderingDeviceVulkan::initialize(VulkanContext *p_context, bool p_local_de | |||
context = p_context; | |||
device = p_context->get_device(); | |||
if (p_local_device) { | |||
frame_count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implications of removing this aren't totally clear to me. The logic with p_local_device is slightly different as it isn't connected to a window in that case. I'll have to think about this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
There are two independent things going on:
- What frame_count was supposed to do. I've been meaning to ask you that. (and it sounds like this should be user-provided rather than assuming a value).
- Right now
FRAME_LAG
does the job of the max frames to be ahead. It is hardcoded but I will be modifying this to be user-tweakable
- Right now
frame_count
was completely broken as I described in Vulkan GPU/CPU synchronization is wrong #80550 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I am referring to comes from the fact that the local device doesn't have a swapchain and it never calls swap_buffers. It is essentially one frame, and it has to call submit()
then sync()
in order to submit more operations. In those cases, frame_count should always be 1 as there will literally never be more than one "frame".
Obtaining a global frame count from the context fundamentally breaks thing for local devices.
I'm not opposed to grabbing the frame count from context, but if we do that, then we need more checks if we are using a local device to ensure we aren't referencing resources that don't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented more on this in another spot. I think this PR will break local rendering devices in a subtle way
31be2e4
to
c264a30
Compare
fad5ff5
to
91cb1ee
Compare
I'm adding new functionality. In Since it must be provided at very early initialization, I didn't add a gdscript interface for it. That can be added & figured out later by anyone else. The plumbing for that is done now (you may have feedback for changes of course, though). This setting should also address @clayjohn concerns; although they may be misplaced. Even if Godot has no windows, Edit: I've pushed it as two commits because I want CI to double check I got everything correctly for every platform. Having it as 2 commits also makes evaluation easier. If all is well, I will be squash the commits into one. |
This may require something like godotengine/godot-proposals#6423 to be changeable at run-time. |
doc/classes/ProjectSettings.xml
Outdated
Sets the number of buffers before stalling to wait for the GPU. | ||
2 corresponds to double buffering and 3 to triple buffering. | ||
Single buffering may give you the lowest lag but at the high cost of no parallelism between CPU & GPU | ||
Assuming you're able to reach 60fps all the time, then double buffering gives you the lowest lag. | ||
Use Triple buffering if V-Sync is enabled and the GPU is struggling to keep 60 fps (or whatever the monitor refresh rate is). | ||
Triple buffering is almost a must if you're using V-Sync MAILBOX mode. | ||
[b]Note:[/b] This setting is not supported by all APIs. | ||
[b]Note:[/b] This property is only read when the project starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every line break creates a new paragraph, so I suggest using those sparingly.
Sets the number of buffers before stalling to wait for the GPU. | |
2 corresponds to double buffering and 3 to triple buffering. | |
Single buffering may give you the lowest lag but at the high cost of no parallelism between CPU & GPU | |
Assuming you're able to reach 60fps all the time, then double buffering gives you the lowest lag. | |
Use Triple buffering if V-Sync is enabled and the GPU is struggling to keep 60 fps (or whatever the monitor refresh rate is). | |
Triple buffering is almost a must if you're using V-Sync MAILBOX mode. | |
[b]Note:[/b] This setting is not supported by all APIs. | |
[b]Note:[/b] This property is only read when the project starts. | |
Sets the number of buffers before stalling to wait for the GPU. A higher values allows for higher framerates in some situations at the cost of higher visual latency. 2 corresponds to double buffering and 3 to triple buffering. Single buffering may give you the lowest latency but at the high cost of no parallelism between CPU & GPU | |
Assuming you're able to reach the monitor's refresh rate all the time, double buffering gives you the lowest latency while still having some level of parallelism. | |
Use Triple buffering if V-Sync is enabled and the GPU is struggling to keep up with the monitor's refresh rate. Triple buffering is almost a must if you're using the [b]Mailbox[/b] V-Sync mode. | |
[b]Note:[/b] This setting is only supported with Vulkan-based rendering methods (Forward+ and Mobile), not Compatibility. | |
[b]Note:[/b] This property is only read when the project starts. There is currently no way to change this value at run-time. |
Also, are the terms "double buffering" and "triple buffering" correct here? I see those being used to refer to V-Sync behavior, with double buffering offering less latency but a "permanent" drop to 30 FPS (or 20 FPS, 15 FPS, …) whenever you can't sustain 60 FPS. I thought this was configured separately, see godotengine/godot-proposals#4065.
2023-10-22.12-42-22.1.1.mp4i think you guys still need to test before merging. This bug still can happen if you overload too much |
@@ -154,7 +159,7 @@ class VulkanContext { | |||
VkSwapchainKHR swapchain = VK_NULL_HANDLE; | |||
SwapchainImageResources *swapchain_image_resources = VK_NULL_HANDLE; | |||
VkPresentModeKHR presentMode = VK_PRESENT_MODE_FIFO_KHR; | |||
VkSemaphore image_acquired_semaphores[FRAME_LAG]; | |||
VkSemaphore image_acquired_semaphores[MAX_FRAME_LAG] = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate why multiple acquired semaphores are needed per window?
AFAIK, only one swapchain image is generally acquired at a time. (this is actually an issue I was looking into for #84137 and in fact, I saw (in superficial testing) no detriment in keeping just 1 image_aquired_semaphore per Window.
(I'm quite possibly bad and wrong about this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I had the same question actually!
Here's the thing:
- Indeed just one semaphore for all windows is how I think should be properly done (unless you're careful with barriers and want to explicitly start doing Compute Work using async compute for the next frame while the current frame is still being worked on)
- In OgreNext we use one semaphore and funny enough, that triggered driver bugs for various drivers (which have been long been fixed)
- Official and unofficial samples use one semaphore per swapchain (this is most likely why the drivers didn't catch that bug for OgreNext...), and since I saw that Godot was doing the same, I decided not to change this.
But yes, for the record for desktop*, I am more comfortable with one semaphore for everything because that's how I think proper synchronization should be.
* I say desktop because I'm having timing issues on Android that I can't fix in OgreNext and I never tested the theory of whether they're caused because of having one semaphore for everything. But on the other hand, Android has a lot of problems of all sorts.
@@ -1680,7 +1680,10 @@ Error VulkanContext::_create_semaphores() { | |||
/*pNext*/ nullptr, | |||
/*flags*/ VK_FENCE_CREATE_SIGNALED_BIT | |||
}; | |||
for (uint32_t i = 0; i < FRAME_LAG; i++) { | |||
|
|||
CRASH_COND_MSG(frame_count == 0, "Used before initialization."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! I ran into that same issue trying a provisional fix to what your PR does, this seems much cleaner than the current logic on master.
[code]2[/code] corresponds to double buffering and [code]3[/code] to triple buffering. A value of [code]1[/code] is not recommended. | ||
Double buffering may give you the lowest lag/latency but if VSync is on and the system can't render at 60 fps, the framerate will go down in multiples of it (e.g. 30 fps, 15, 7.5, etc ). Triple buffering gives you higher framerate (specially if the system can't reach a constant 60 fps) at the cost of up to 1 frame of latency (when VSync is On in FIFO mode). | ||
Use Double buffering if V-Sync is off. Triple buffering is a must if you plan on using V-Sync MAILBOX mode. | ||
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this link should go into the Godot documentation.
Regarding the simulator itself, nice tool! I wish the sliders had manual input fields and/or more precision at the bottom of its range (logarithmic scale). I'm generally interested in high frame rate scenarios (<10 ms vblank time, and values like 8 and 6 ms are really difficult to adjust with how the sliders work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this link should go into the Godot documentation.
Why not? We have plenty of external links in the class reference and documentation in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions. | |
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation, to better understand how it is affected by different variables under various conditions. |
Add punctuation to separate clause
if (desiredNumOfSwapchainImages < surfCapabilities.minImageCount) { | ||
desiredNumOfSwapchainImages = surfCapabilities.minImageCount; | ||
} | ||
uint32_t desiredNumOfSwapchainImages = MAX(surfCapabilities.minImageCount, swapchain_desired_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some sources I found seem to state it's sensible to pick the lower bound as 1 + surfCapabilities.minImageCount
(this might, on particularly conservative implementations, relax the contention for driver resources a bit when acquiring the next image; the common case here is still that the default 3 images are the value that you are gonna get)
Since you have made the desired values configurable, a potential extra setting could make this configurable as well, defaulting to an offset of 0 or 1. (or a simple flag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found all references to this info trace back to the Vulkan Tutorial.
After research I found the following:
- The OSes that have sync issues (like X11 on Linux) already set minImageCount to a large enough value like 4 or 5.
- I suspect it is highly likely that the tutorial confused "wait on the driver to complete internal operations" with how VSync is supposed to work. This is known and can be observed in my VSync Simulator. If you set swapchain_count to 2 and the rendering time is higher than the refresh rate, then the entire framerate will alternate to an integer divisor of the refresh rate (e.g. 60fps, 30 fps, 15 fps). Depending on how the OS compositor internally works, this might happen with swapchain_count > 2.
- Seeing the other ticket, it is clear some users explicitly want this behavior (because if the CPU & GPU are fast enough to not miss any VBLANK, then latency is minimum).
@@ -57,7 +57,8 @@ | |||
return _window_create(p_window_id, p_vsync_mode, surface, p_width, p_height); | |||
} | |||
|
|||
VulkanContextIOS::VulkanContextIOS() {} | |||
VulkanContextIOS::VulkanContextIOS() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidental line break.
<member name="display/window/vsync/buffer_count" type="int" setter="" getter="" default="2"> | ||
Sets the number of buffers before stalling to wait for the GPU. | ||
1 may give you the lowest lag/latency but at the high cost of no parallelism between CPU & GPU. | ||
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions. | |
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation, to better understand how it is affected by different variables under various conditions. |
Hi, would it be possible to get the build artifacts of this PR ? They are sadly not available anymore. Thank you very much |
@Opalie unfortunately the artifacts have expired and it looks like we can't restart the jobs, sorry. I got suggested that a viable route would be to fork this branch and make a small change as to re-trigger them for your fork. This should be all doable from the web UI but might be a bit cumbersome. I'd personally recommend building it from source, but only if you feel comfortable with the idea. Here's some docs if that were the case: https://docs.godotengine.org/en/stable/contributing/development/compiling/index.html |
Is this just waiting on review, or has it effectively been superseded by #87340? |
It'd effectively be superseded by that, so if you're waiting on this one I'd suggest confirming if that PR fixes the problem as well. The PRs will be completely incompatible and Matias has reviewed that I implemented what he detected correctly in that PR as well, as well as fetch the settings/doc changes from it. |
Just a minor follow up, I did test the other PR and it doesn't seem to fix the obnoxious stuttering, although the curious thing is that neither of the directx implementations (both the angle compatibility mode and the recent dx12 stuff) suffer from the stuttering issues, so this may be some insufferable nvidia vulkan/opengl issue. The hunt for "why does vsync cause so much misery" continues. Still, at least there are options now. |
<member name="display/window/vsync/swapchain_count" type="int" setter="" getter="" default="3"> | ||
Sets the number of swapchains (back buffer + front buffer). | ||
[code]2[/code] corresponds to double buffering and [code]3[/code] to triple buffering. A value of [code]1[/code] is not recommended. | ||
Double buffering may give you the lowest lag/latency but if VSync is on and the system can't render at 60 fps, the framerate will go down in multiples of it (e.g. 30 fps, 15, 7.5, etc ). Triple buffering gives you higher framerate (specially if the system can't reach a constant 60 fps) at the cost of up to 1 frame of latency (when VSync is On in FIFO mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double buffering may give you the lowest lag/latency but if VSync is on and the system can't render at 60 fps, the framerate will go down in multiples of it (e.g. 30 fps, 15, 7.5, etc ). Triple buffering gives you higher framerate (specially if the system can't reach a constant 60 fps) at the cost of up to 1 frame of latency (when VSync is On in FIFO mode). | |
Double buffering may give you the lowest lag/latency but if VSync is on and the system can't render at 60 fps, the framerate will go down in multiples of it (e.g. 30 fps, 15, 7.5, etc.). Triple buffering gives you higher framerate (specially if the system can't reach a constant 60 fps) at the cost of up to 1 frame of latency (when VSync is On in FIFO mode). |
Superseded by #87340. Thanks for the great work on this! |
The original code had two bugs:
This commit gets rid of
RenderingDeviceVulkan::frame_count
&RenderingDeviceVulkan::frame
which should've never existed and makes everything depend onVulkanContext::frame_index
&VulkanContext::FRAME_LAG
See #80550 for details
Fix #80550
Likely fixes #81670.
Specific things that need testing:
Aside from overall stability and ensure there are no regressions
Note: This PR may uncover bugs that were previously hidden or that were rare to reproduce. This may give the illusion that the code has become more unstable when in reality it's just uncovering hidden flaws.