Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suballocate DX12 buffer creation #3163

Merged
merged 28 commits into from
Dec 20, 2022
Merged

Conversation

Elabajaba
Copy link
Contributor

@Elabajaba Elabajaba commented Nov 2, 2022

Checklist

  • [ ] Blocked on Migrate to Windows-rs from winapi #3207 Worked around by feature gating it behind the windows_rs feature
  • Run cargo clippy.
  • Add change to CHANGELOG.md. See simple instructions inside file.
  • Fix the buffer and texture usage flags.
  • Don't panic if deallocation fails when destroying a texture or resource.
    • Is panic() on deallocation failure even a real issue?
    • What should happen on deallocation failure, considering it can't (currently) return an error?
  • Figure out what to do with heap_created_not_zeroed.
  • Fix the error handling so the gpu-allocator -> wgpu error implementation is not just a bunch of todo!().
  • Consider waiting until Add support for presser Traverse-Research/gpu-allocator#138 lands, and see if migrating to presser would be needed
    • More safety!
    • Might break a lot of user code...
    • That user code is probably UB anyways?
  • Investigate if changing gpu-alloc to support dx12 is feasible, and would be accepted

Connections
~~ Blocked on #3207 ~~ Worked around by feature gating it behind the windows_rs feature
closes #2720

Description
DX12 is currently quite slow in wgpu. This uses gpu-allocator to batch together allocations into heaps and uses CreatePlacedResource instead of CreateCommittedResource to create buffers and textures, which leads to large performance gains (~30-50% in "normal" scenarios, with significantly larger gains in write_buffer heavy scenarios (~250x+ in an unrealistic scenario where it calls write_buffer 1000x in a loop, going from ~1fps to ~250fps)), and in my testing no performance decreases.

Testing
Tested the examples, ran cargo test, backported it to 0.14 and tested against bevy+bistro, and tested against a modified water example where it loops the render write_buffer 1000x times on the main thread, 500x each on 2 scoped threads, or 100x each on 10 scoped threads to make sure multithreading wouldn't panic.

It was quite a bit faster in all of these scenarios, except for bevy+bistro at 4k where it was heavily gpu limited and ran about the same.

Potential Future Improvements

  • Make these into issues
  • Consider if adding a way to get the vulkan/dx12/etc allocator is worthwhile
  • Consider actually suballocating resources instead of just suballocating heaps. DX12 allows for multiple bindings to the same resource through subresources.
    • This would probably require changing the unmap_buffer trait to pass in the subresource id for unmapping a buffer
  • Figure out why dx12 is still slower than Vulkan when calling write_buffer a lot
    • Basic profiling seems to point to windows syscalls taking the majority of the time, with ntdllZwAllocateLocallyUniqueId taking almost 40% of the time in the 1000x looped write_buffer test
  • Most dx12 things are already thread safe, see if it's possible to avoid wrapping them in rust sync primitives to avoid multiple levels of locks and/or reference counting

@Elabajaba Elabajaba changed the title Temp dx12alloc Suballocate DX12 buffer creation Nov 2, 2022
@Jasper-Bekkers
Copy link

Nice to see these changes @Elabajaba, if there are any features that wgpu would like to see in our allocator please just file an issue on the repo.

@Elabajaba
Copy link
Contributor Author

Blocked on #3207 until Mozilla gets around to vendoring windows-rs.

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2022

Codecov Report

Merging #3163 (719d26c) into master (052bd17) will increase coverage by 0.05%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master    #3163      +/-   ##
==========================================
+ Coverage   64.30%   64.36%   +0.05%     
==========================================
  Files          83       85       +2     
  Lines       42270    42397     +127     
==========================================
+ Hits        27181    27287     +106     
- Misses      15089    15110      +21     
Impacted Files Coverage Δ
wgpu-hal/src/dx12/mod.rs 27.27% <0.00%> (-0.09%) ⬇️
wgpu-types/src/lib.rs 88.30% <ø> (ø)
wgpu-types/src/assertions.rs 50.00% <50.00%> (ø)
wgpu-hal/src/dx12/suballocation.rs 85.49% <85.49%> (ø)
wgpu-hal/src/dx12/device.rs 88.45% <97.14%> (+0.25%) ⬆️
wgpu-hal/src/auxil/dxgi/result.rs 65.38% <0.00%> (-11.54%) ⬇️
wgpu-core/src/validation.rs 58.87% <0.00%> (-0.14%) ⬇️
wgpu-core/src/device/mod.rs 66.72% <0.00%> (+0.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@MarijnS95
Copy link
Contributor

Just stumbling upon this PR, I'll see to accelerating Traverse-Research/gpu-allocator#138 so that you're unblocked on that regard!

@cwfitzgerald
Copy link
Member

As #3207 is basically perma-blocked until further notice, I think we should work around the situation by having a feature flag and falling back to the old behavior when it's disabled. This will let us continue to innovate, and also not force the issue with moz.

wgpu-hal/src/dx12/device.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/mod.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/device.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/device.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/device.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/windows_rs_suballocation.rs Outdated Show resolved Hide resolved
@Elabajaba Elabajaba marked this pull request as ready for review December 19, 2022 08:04
wgpu-hal/Cargo.toml Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
wgpu-hal/src/dx12/suballocation.rs Outdated Show resolved Hide resolved
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for all this work! Looks great!

@cwfitzgerald cwfitzgerald merged commit f3c5091 into gfx-rs:master Dec 20, 2022
@cwfitzgerald cwfitzgerald mentioned this pull request Dec 20, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suballocate Buffers in DX12
5 participants