-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory packing difference C++ <-> HLSL #58
Comments
Hi @fstrugar, that is indeed a limitation of the system. In SIMD mode it would be impossible to do, because internally all types are 16 bytes as you've observed. In HLSLPP_SCALAR it would in theory be possible, if you changed the n128 type and made it so that there's n128, n64, n32 or something like that. I don't think it's an easy change though although I could be mistaken. What I've ended up doing in my toy renderer is to have actual interop types (e.g. shader::float3) that are trivially convertible to the hlslpp types, and they work fine, if that's what you need. This library wasn't thought as a way to interop but as a way to migrate hlsl code from the GPU to the CPU or viceversa so it can be a bit awkward. |
Thanks! Yeah I'm currently using something like the shader::float3 as you suggested, and HLSL++ looked like it could clean & unify that up, as well as provide full interop on math functions. I'll play around a bit, see what happens, thanks for the quick answer, much appreciated! |
Yeah I don't know, maybe a header called hlsl++_interop would be useful so that not everyone has to write the same code many times. Let me know how your experience goes and what problems you encounter, it could guide a future implementation. |
Every type would need a template tag argument which would tell what packing convention to follow, then you could modify the aligments. Pretty invasive change though. |
@devshgraphicsprogramming In terms of code, what are you suggesting? (just so I understand what you mean by tagging) |
enum class Packing
{
Scalar, // align to component type
ConstantBuffer, // std140 in GLSL
StructuredBuffer // std430 in GLSL
}; Then template<Packing align_tag=Scalar/*or whatever you want the default to be*/> in front of every type you have or as last template arg (you probably don't use variadics anywhere). then depending on the tag you can P.S. C++ is a bit weird, you can't have a size thats not a multiple of alignment, so there's no way to achieve this struct MyStruct
{
float3<ConstantBuffer> rgb;
float a;
};
static_assert(sizeof(MyStruct)==16); if See this: https://godbolt.org/z/dhvqGvGGc |
P.S. is hlsl++'s P.P.S. Are matrix types stored as rows and is their alignment and size same as the row types? |
I see what you mean, and I agree it's quite intrusive. This isn't the first time this size and alignment problem has come up. It's an unfortunate tradeoff that is imposed by the SIMD requirement. What I do in my own code which I mention elsewhere is that I use some lightweight translation types through the store() and load(), I've yet to migrate this to hlsl++ but it's in the back of my mind.
hlsl++ uses the minimum size it can get away with on the platform, which is 4 floats (i.e. 16 bytes). Once upon a time I thought the __m64 types were a viable option but MSVC decided to deprecate them in 64-bit builds for no particular reason (as they are still supported by the CPUs and the other compilers like Clang and GCC) I've always wanted to just make my own implementation of the types and live with it but it's work I need to do and I don't know how well it plays with the rest of the float types (in terms of conversions). If this is something that would be useful to you I can add it to the list to look at.
hlsl++ stores data in whatever way is best to save memory, which means there is no particular physical layout they prefer. For example, float4x2 and float2x4 are the same thing (two "rows"), and float2x2 is a single __m128. The multiplication functions take care of abstracting all that and extract the data as needed. That's why interfacing with shaders and data storage is a tricky thing, this library is meant for manipulation of data, and not necessarily shader upload which is what some people have complained about. I don't provide a solution in the library but it shouldn't be hard to make your own as all the translation primitives are there. |
cause thats MMX, not even SSE... I don't know if CPUs removed MMX and now you're just getting silent CISC2CISC firmware patch-ups that promote old MMX to SSE. float2 really needs to be stored as scalar or NEON width-2, you're not gaining any perf from promoting them to float4 (because SSE isn't 4x a lot of the time, and compilers autovectorize quite well). There's a potential 2x ALU gain (realistically 1.7x) but you'll end up fetching 2x the cache lines, so you're just as likely to have a performance regression as you are to have a gain. Silent promotion and demotion (like you do for your own float3 code) is much better because you're not bloating the storage, and are likely to win on RAM bandwidth which bottlenecks more than pure FLOPS.
HLSL is GLSL row_major by default, meaning that if I were to cast a matrix pointer to a For matrices aligned to 16 bytes, using float4 as internal storage its not a problem (2x2, 4x2, 2x4, 3x4, 4x4). The corner case is really just the 3x2 which should have a sizeof equal to 3 times float2 which is 24.
Do you recommend I hard fork and call it something else then? |
You are 100% in the right here with respect to NEON, and swizzles in NEON could do with some improvements. However I have limited time to dedicate to the library :) If there's interest and you'd like to help out I'm all for it. The idea behind float2 being SSE is that we never convert back and forth between types.
This is really up to you, I wouldn't want it in all honesty but I wonder if you have hard requirements that hlsl++ doesn't meet and I can't fix. I'm open to everything else we've talked about except making the codebase templated. I can see an acceptable compromise where float4x4 optionally derives from some floatNxM<4, 4> type (and so on for the other types) and you use that in your code. That way the codebase stays the same and you get what you need. |
You've saved a register pack/unpack (or unaligned load, masked store), but now you're wasting 2x bandwidth. I'd just let float2 be scalar (unless NEON) and 8 bytes. |
We're derailing this issue already quite a bit, and there are quite a few requests in the mix. If you can measure that this change you're talking about is considerably faster I can take a look. It's not trivial to change. Otherwise we can stick to the issue or open a new one. |
In my case its a hard requirement that |
There is no way for this to happen automatically without C++ Reflection TS, a mountain of templates which rebuild your types with correct member alignments. The only compromise with the current state of C++ is to provide two flavours of Std430 (storage buffer) is not possible in C++ because it would allow another member in a structure to alias the padding. There' is a dirty hack possible with a macro and a template #define HLSL_MEMBER(TYPE) alignas(align_trait<TYPE>::alignment) align_trait<TYPE>::type
template<AlignmentTrait align_trait=constant_buffer_alignment_trait>
struct MyStruct
{
HLSL_MEMBER(float3) rgb;
float a;
}; but then this will make your structs have Also IIRC this would STILL require that you have a special This is all nuts, you will go insane and I just recommend not worrying about this at all and pretending everyone supports
|
support for |
HLSLPP_SCALAR means non-SIMD, it is unrelated to any extension of SPIR-V or Vulkan. I only created it to compare performance between the SIMD version and one that isn't. I'm going to have to make a clear note in the front page to say that shader interop of this kind is not a feature of this library. |
Unlike in HLSL, HLSL++ will still pack all types in 16b n128, even if running in HLSLPP_SCALAR. That breaks the C++<->HLSL interop with constant/structured buffer packing which is an useful use case. Any ideas on how it could be (easily) hacked to support HLSL-like alignment? :) Thanks!
The text was updated successfully, but these errors were encountered: