-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call for feedback on Provisional Vulkan Ray Tracing extensions #1206
Comments
I finished up a very rough port of the NV extension in my code-base. So far I think the only real surprise was that Another oddity was that now the Once I get the Nvidia driver up and running I'll provide some more feedback wrt the rest of the toolchain if I have any. |
Is there a reason why VkAccelerationStructureBuildOffsetInfoKHR doesn't have a structure type member and next pointer? Might be wise to add this to future proof the extension so we don't get more vkGetPhysicalDeviceProperties2 shenanigans. |
Thank you so much, I've been experimenting with the NV extension waiting for this to happen 😍. I'm currently using a huge (pixels * 1KB) scratch buffer* from which I reserve some space per pixel for per ray scratch information. The current 1KB per ray will probably be enough in most cases but not all, and even then it's a huge waste of memory to reserve it per pixel when it's only needed per ray (eg. ~1GB vs >1MB). I've tried using gl_SubgroupInvocationID etc. to limit the memory allocation to "active subgroup" but seems without subgroup memory barriers they end up trampling over each other (get nice artifacts that show workgroups and subgroups if you squint). Am I correct in assuming the new shadercallcoherent and GL_KHR_shader_subgroup interactions should help me with that? Ie. I can mark a buffer shadercallcoherent and/or use subgroupMemoryBarrierBuffer etc. to manually issue barriers** and hopefully it'll make my memory issues go away? PS. I've also tried having the scratch data as part of the ray payload, but that gets slow fast (>20x difference, I assume all of it's copied by value between the stages) and exceedingly clunky since you can't get a reference to it in any way (I'm using GL_EXT_buffer_reference2 to replicate pointers). * I'm actually currently using a hacky suffle between N buffers that hopefully have time to settle down before being bulldozed by another workgroup 😅. ** I know barriers are bad for performance 😉. If somebody has a better idea how to build and pass around complex structures (eg. for photon/beam mapping etc.) in the Vulkan RT shaders I'm all ears. TL;DR; I have memory issues, I probably need some barriers. |
There are lot of broken VUIDs in the spec there. Lot of visible "VUID-{refpage}". |
@krOoze they will be fixed in the next spec update. |
Is it possible to expand the spec on details as to when the actual geometry for the acceleration structures is consumed and can be discarded? There is some wording on consuming e.g. triangles, but to me it's not clear what this actually means and when/if geometry is copied over to the AS no longer requiring the initial buffers, or if this is implementation dependant. |
The spirenv chapter says |
Minor nit: despite it having a length, (Why is this member not just a pointer to an array of |
In PS: @expipiplus1 Yea, that is not really expressible by |
I submitted a suggestion regarding adding level of detail tracing, or "cone" tracing, here: #1221 |
It was originally, but probably not anymore since we changed the capability for provisional. Will remove any illusions of that in an upcoming spec update. |
@mbehm, if you're going to be using subgroup operations be aware of the "invocation repack instruction" description in the shader in the "Shader Call Instructions" section of the spec. Overall, I suspect that you probably either do need some more barriers, but it's hard to say with just the information given. |
@ewerness-nv Thanks I'll try to keep that in mind. Yeah the description was a bit vague, a detailed version would've been way longer and reproducing would probably require actual code. Basically if I understood correctly (and based on the memory issue/artifacts I'm having) the RT kernels are scheduled as workgroups/batches of X. So if I can barrier memory per group would probably solve my problems. Currently without barriers to control memory access I need to allocate the scratch space per pixel which is few gigabytes more than I'd want to. Another thing that'd solve at least my problem (and probably quite a few) easily... If one could declare a "local" buffer reference eg, |
Elsewhere in the spec anything sized to Why not split this into two arrays if there is an important distinction between device ID and acceleration structure compatability ID?
AFAIK array of counts isn't used anywhere so far in the spec, it might be weird to have to create an array of Another solution would be to have a pointer to an array of That would avoid having to have an array of counts and the lengths of all the arrays are clearly specified. |
One thing that threw me off slightly is the |
my question is about AS host build. I found I can assign a VkDeviceOrHostAddressKHR to scratchData in VkAccelerationStructureBuildGeometryInfoKHR in host AS build, but for binding memory to AS,there is not a VkDeviceOrHostAddressKHR, instead only a VkDeviceMemory in the VkBindAccelerationStructureMemoryInfoKHR, does that mean we don't need to bind memory when use host AS build, or we bind device memory when using host AS build? |
Whilst the host builds accept host pointers, the spec requires the use of device-mapped memory resources so they can be used (for example) with device copy commands. So you do need to bind a VkDeviceMemory to these AS builds, but it has to be from a type that includes host-visibility (and ideally host coherence). If you want to use a pre-existing host pointer for this purpose, you can import a host pointer using VK_EXT_external_memory_host. |
@krOoze were you able to consider my comment here: #1206 (comment), thanks. |
@expipiplus1 I have considered it, but I had not much to add. In the end I am not the one that needs to consider it. The two UUIDs stuck together as dynamic array seems weird. They could just be
I was getting bit beside the point there, the pointer taking version should be able to handle more complicated cases too (e.g. if it is array, then one could spam pointers to individual elements). Though pointer is potentially 64 bits, count could be less, amortized over the array length, but the [1,1,1] case would indeed be weird. Depends what they were going for with this parameter. It feels like they were trying to optimize something there; it needs to be identified what. |
@krOoze I see, thanks. I think it would be most consistent with the spec to have an array of While we're here. Shouldn't |
Yes - this structure is also used on the GPU for indirect builds so it's not something that can easily change, and we don't want to be chasing pNext pointers on the GPU! |
@W4RH4WK wrote
so we don't plan to make a change for this. |
@SaschaWillems wrote:
This was addressed in the 1.2.140 spec update by adding the following language:
Hopefully this helps to clarify. |
The values returned by vkGetAccelerationStructureMemoryRequirementsKHR which are to be ignored could be set to better default values. Certain values would make this special cases not actually special, making the specification easier to follow. More info on that here: #1272 |
Thanks @krOoze, who would be the best person to ask? If there's a good reason for this geometry representation then it's all cool, but I'd like to know why. |
@Jasper-Bekkers wrote:
Yes. An oversight. We'll fix in a future API update.
We've improved the documentation for these (already released). Hopefully it's clearer now. |
@krOoze wrote:
@expipiplus1 wrote:
@krOoze wrote:
@expipiplus1 wrote:
We are making some changes to the signature of vkGetDeviceAccelerationStructureCompatibilityKHR and we have renamed versionData to pVersionData amongst other changes here. Initially I agreed that the pointer to 2xUUID was weird, but after staring at it for a while (and improving the documentation), I realized why it is the way it is: pVersionData is a pointer to an array of 2*VK_UUID_SIZE uint8_t values instead of two VK_UUID_SIZE arrays as the expected use case for this member is to be pointed at the header of an previously serialized acceleration structure (via vkCmdCopyAccelerationStructureToMemoryKHR or vkCopyAccelerationStructureToMemoryKHR) that is loaded in memory. Using arrays would necessitate extra memory copies of the UUIDs. |
Thanks for explaining, @dgkoch. Would it be possible to know the reasoning behind the unusual layout of |
It was trying to replicate the functionality of D3D12_ELEMENTS_LAYOUT_ARRAY / D3D12_ELEMENTS_LAYOUT_ARRAY_OF_POINTERS in D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS without using unions, and it has obviously failed to do that well (we've had a lot of confusion and questions about it, and it makes validation annoying). We are planning to change this for the final version. |
Combining of TLAS and BLAS into the same structures is a mess. Just because they are both acceleration structures does not mean that it is necessary to combine everything together. I think it would be better, if there will be two separate set of structures and functions for TLAS and BLAS without that "unions" (triangles/aabbs/instances), until BLAS can be node of another BLAS and/or TLAS can be node of another TLAS. Currently it is even hard to write "hello world" using present API (I succeeded, but it was quite hard to rewrite one even from VK_NV to VK_KHR).
Above condition make it even more meaningless. |
It would be good if semantics of |
It seems intuitive to me as is. The "upstream" shader writes sends data to the "child" shader via the rayPayloadEXT variable (so it is out from that shaders' perspective). The child shader receives this as input via it's rayPayloadInEXT variable (which yes, can be modified and sent back to the upstream shader), but the child shader can also result in more child shaders which would be passed the ray payload via it's own rayPayloadEXT that gets provided it's downstream shaders. IMO reversing the naming would make it even more confusing. |
If we separate the TLAS and BLAS now, it'll be harder to add multi-level AS support at some point in the future. |
Can we have some clarification on what buffers VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR is needed on exactly and why ? Is it all buffer types in descriptor sets referenced during a trace, or just SBT + tlas/blas source. If it is the former, why is this needed for things like texel buffers ? |
8 bits in Generally, the use of C bit fields in
|
This structure is consumed on the GPU so we want keep the layout the same as the corresponding structure in DXR. The NV extension originally took the approach of the not defining the structure but just defining the layout in the spec. The KHR extension added the struct with bitfields for ease-of-use, but it's really non-normative (as noted in the xml file: The bitfields in this structure are non-normative since bitfield ordering is implementation-defined in C. The specification defines the normative layout.) |
Minor nit: In all of the acceleration structure building commands the Edit,162: I think this is still the case for
Looks like this was fixed in 1.2.162! Thanks
Makes sense! |
Final extensions have been released now. #1402 |
In #1205 we announced the launch of Vulkan provisional ray tracing extensions.
Khronos welcomes feedback on the Vulkan Ray Tracing set of provisional specifications from the developer and content creation communities through the Khronos Developer Slack and in this issue. Developers are also encouraged to share comments with their preferred hardware vendors. A provisional release enables us to ship beta drivers and enable application prototyping to catalyze developer feedback. It also enables us to work on various open-source ecosystem artifacts in public, such as high-level compilers, validation layers, and debuggers, before spec finalization.
Although we do not have a specific timeframe for specification finalization to announce at this time, we want to move forward as quickly as we can, while ensuring the developer community is happy and we have a completed set of conformance tests and at least two implementations that can pass those tests.
Your feedback is critical to enable us to finalize the first version of Vulkan Ray Tracing and make it genuinely meet your needs!
To provide feedback: either leave a comment below (if minor), or create a new issue and include a link in the comments below.
The text was updated successfully, but these errors were encountered: