-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to 16-byte alignment in malloc #14456
Conversation
c7750f3
to
5c01997
Compare
correct (since `alignof(max_align_t)` is 16 for the WebAssembly clang target) | ||
and fixes several issues we have seen in the wild. Since some programs can | ||
benefit having a lower alignment we have added `dlmalloc-align8` and | ||
`emmalloc-align8` variants which can use used with the `-s MALLOC` option to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which can use used
typo
We had some hope of avoiding the speed and size regressions this causes, by changing the alignment of long double to be just 8 and not 16 (which would have no harmful effects) and by ignoring SIMD (which is not mentioned in the C/C++ specs AFAIK - but may have concerns in practice). Has there been more investigation there? |
I considered that option but I don't think its a reasonable path right now (literally today) for a couple of reasons:
The idea behind this change is to get something that we can land today and keep everyone mostly happy. We can always try to change our |
5a4f642
to
62c6ab6
Compare
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
Emscripten had always behaved this way with its 8-byte aligning malloc, so it can't possibly be a regression. Personally I don't think it should be malloc's responsibility to hold SIMD programmers's hands, so I'd be in favor of reducing |
Focusing on the practical side, where have the bug reports on this happened? I might change my mind here if it's on SIMD code, in particular. But if it's on float128 or something else, I'd prefer to not regress. About the ABI aspect, do you think it would be controversial? float128 doesn't need to be aligned for any reason I can see, as no hardware actually loads it and no wasm spec refers to it. Or has that changed? |
As far as I can tell the bug reports relate to max_align_t not be honored by malloc. They don't relate to SIMD per say, just our spec-non-conformance breaking folks assumptions.
|
Speaking of the practical side do we have any evidence that real world programs are actually effected by this increase in the malloc alignment? Its hard to imagine a performance-sensitive program that also happen to contains millions of micro allocations. Isn't looking at allocation strategies one of the first things anyone optimizing their code would do? |
I admit the evidence is not from real-world program, but just from benchmarks like Havlak that do focus on mallocing many small things.
I do wish I understood the issue here better though. The only things we misalign are float128 and SIMD, so it seems like every place a user notices this has to be one of those? And, how do they notice this - false positives in UBSan? (I think I remember such a report) Or runtime errors on misaligned SIMD loads (which trap?) If it's hard to collect that data then I wouldn't insist on it, but I thought maybe you already knew the answer. |
IIUC its not about hardware or trapping, it just about the assumption and assertions made in higher level user code. See #10072 (comment) for example. |
If it's just for asserting on Note that even if benchmarks like Havlak show a large effect, there will be smaller effects in less extreme cases, possibly even real-world ones. It's overhead for no good reason, if the only reason is float128. I think this would be more than good enough reason to reopen the issue of float128, if it comes to that (we only compromised on float128 because we found technical solutions to its overhead in printf, and we thought that with that work we avoided overhead for our users). |
It looks like max_align_t sometimes defined by musl and sometimes by the compiler: emscripten/system/lib/libc/musl/include/stddef.h Lines 13 to 15 in 9ea47e0
emscripten/system/lib/libc/musl/arch/emscripten/bits/alltypes.h Lines 290 to 293 in 9ea47e0
I will look into the llvm-side change now.
|
Oops I didn't read that correctly, it looks like it is only defined for certain versions of C/C++ and always defined by musl. So we do have more control here than I thought. There may also be some builtin defines in clang that needs updating though. |
Although you could hack I still think the best way to go is to fix this on the LLVM side by reducing |
Does that solution have anything to do with alignment of |
The issue Alon is referring to is the age old discussion/debate/fight we had about if/why |
Yeah, but that's just about the size, not alignment. I've said it multiple times in the previous thread(s), you can keep the 128 bits size while reducing just the alignment requirement. That's still an ABI change but a somewhat less significant one, and shouldn't break anyone who actually wants super high precision float math for some reason. |
I think we understand that. IIUC the point is that reducing the size of long double to 64bit would fix both of these issues, and potentially more that we have not yet hit. |
Ok, sounds like changing I think we should change the alignment (but not the size, as mentioned) of long double to 8. There is no benefit to a higher alignment for it. I hope we can do that across the entire wasm ecosystem for consistency. Does that sound good to work towards? |
I like the idea of reducing the alignment of long double (since it would allow proper use of That still leaves the issue that slots aligned to |
Nobody should be doing that. |
Please, let's drop the pretense of stability. You guys break my Taisei builds literally every release, either with your inability to commit to a stable CLI, or random runtime regressions, or JS API breaks, or a combination of the three. A tiny ABI change is literally nothing in comparison. I would never expect linking code compiled with different versions of emscripten to actually work correctly, if at all. It wouldn't be a good practice even if you did care about stability, anyway. |
Personally, it makes my day a little bit worse to read a comment like yours. I think the comment is not productive. I'm sorry to hear that we've broken your project that often, and I'd like to look into ways to avoid that in the future - in particular, perhaps there are more automatic tests that we need. But it's not helpful or fair to say that we "do not care about stability" - that's inaccurate and unnecessarily harsh. It is true that we sometimes break backwards compatibility intentionally, but we try to do that only when strongly justified, and to mention it in the changelog. (Maybe we've gotten that wrong sometimes, of course.) In addition, the comment you are responding to did not claim we had a stable ABI: " Again, I'm not dismissing your concerns. Let's find ways to improve stability - it may be worth filing issues for specific things. But we should do so in a way that's pleasant for everyone here. Returning to the topic, I don't think there is disagreement between you, me, and that comment. We do not have a stable C or C++ ABI, and so an ABI change here is possible for us to do. But, we should do so only if it's justified, which I think in this case it might be for the reasons discussed earlier. |
Exactly — we do not have a stable ABI, so why does it take us over a year to agree on a barely significant, clearly beneficial ABI change? Why are we even talking about stability and performance when we don't have correctness in this instance? I would shut up if you approached the arguably more important stability concerns (such as CLI stability) with half as much deliberation, but it is quite frustrating to engage in this nebulous "ABI breakage" discussion when in reality, whenever I unwisely decide to update emscripten for my project, I end up having to do at least one of:
Yes, it is harsh, and perhaps unwarranted, but I hope you can see how it's possible to arrive at the "you don't care about stability" conclusion from my perspective. Anyway, rant over.The way I see it, this PR can and should land ASAP. It's not blocked on the |
I agree this PR should land. My only suggestion at this point is that, since I think we all agree that it is ok to change the ABI to reduce the alignment of float128, that if we do that first then we would avoid raising the malloc alignment and then reducing it. Raising and reducing it could be confusing for users, and might be annoying in bisections as well. So if we can lower the float128 alignment quickly, we can finish all of this properly. @sbc100 are there concerns on the LLVM side about changing the float128 alignment down? |
Ok, lets see how the llvm-side change goes: https://reviews.llvm.org/D104808 |
Fyi, obscure, but Windows heap does this:
so 16 is a good number imho, given the prevalence of Win64 (I realize wasm is 32bit for now). |
#ifndef MALLOC_ALIGNMENT | ||
#include <stddef.h> | ||
/* `malloc`ed pointers must be aligned at least as strictly as max_align_t. */ | ||
#define MALLOC_ALIGNMENT (__alignof__(max_align_t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static assert that it is 16?
This means `max_align_t` is 8 bytes which also sets the alignment malloc. Since this is technically and ABI breaking change we have limited to just the emscripten OS target. It is also relatively low import breakage since it will only effect the alignement of struct that contai `long double`s (extremerly rare I imagine). Emscripten's malloc implementation already use 8 byte alignement (dlmalloc uses and alignement of 2*sizeof(void*) == 8 rather than checking max_align_t) so will not be effected by this change. By bringing the ABI in line with the current malloc code this will fix several issue we have seen in the wild. See: emscripten-core/emscripten#14456 Differential Revision: https://reviews.llvm.org/D104808
Fixed instead on the llvm side. See #14634 |
This means `max_align_t` is 8 bytes which also sets the alignment malloc. Since this is technically and ABI breaking change we have limited to just the emscripten OS target. It is also relatively low import breakage since it will only effect the alignement of struct that contai `long double`s (extremerly rare I imagine). Emscripten's malloc implementation already use 8 byte alignement (dlmalloc uses and alignement of 2*sizeof(void*) == 8 rather than checking max_align_t) so will not be effected by this change. By bringing the ABI in line with the current malloc code this will fix several issue we have seen in the wild. See: emscripten-core/emscripten#14456 Differential Revision: https://reviews.llvm.org/D104808
This means `max_align_t` is 8 bytes which also sets the alignment malloc. Since this is technically and ABI breaking change we have limited to just the emscripten OS target. It is also relatively low import breakage since it will only effect the alignement of struct that contai `long double`s (extremerly rare I imagine). Emscripten's malloc implementation already use 8 byte alignement (dlmalloc uses and alignement of 2*sizeof(void*) == 8 rather than checking max_align_t) so will not be effected by this change. By bringing the ABI in line with the current malloc code this will fix several issue we have seen in the wild. See: emscripten-core/emscripten#14456 Differential Revision: https://reviews.llvm.org/D104808
This means `max_align_t` is 8 bytes which also sets the alignment malloc. Since this is technically and ABI breaking change we have limited to just the emscripten OS target. It is also relatively low import breakage since it will only effect the alignement of struct that contai `long double`s (extremerly rare I imagine). Emscripten's malloc implementation already use 8 byte alignement (dlmalloc uses and alignement of 2*sizeof(void*) == 8 rather than checking max_align_t) so will not be effected by this change. By bringing the ABI in line with the current malloc code this will fix several issue we have seen in the wild. See: emscripten-core/emscripten#14456 Differential Revision: https://reviews.llvm.org/D104808 llvm-monorepo: d1a96e906cc03a95cfd41a1f22bdda92651250c7
Is there current guidance on what to do if I need 16-byte alignment? Right now I allocate with extra padding and manually align my pointer inside of the allocation, but it's a little awkward, and more importantly that makes it impossible to use realloc. I was going to ask "is there some way to make emscripten's dlmalloc use 16-byte alignment" but from reading over this and some of the related issues, it sounds like dlmalloc is tuned around 8-byte alignment, and there are concerns that changing it would negatively impact software. My motivation for this is 128-bit SIMD, for which natural alignment is 16 bytes. My understanding is that SIMD is part of why other ABIs have standardized on 16-byte alignment for stack frames, i.e. Mac OS X and (iirc) Windows x64. (Incidentally, I'm not sure how I would tell what the alignment of emscripten's stack frames is, and how that interacts with JS code invoking Maybe the guidance should just be 'use It looks like the current guidance in the official docs re:alignment is from the asm.js days, and talks about not having support for x86 unaligned memory ops. |
If you want allocations aligned higher than If you want to change the default you would need to modify the allocator (either emmalloc.c, dlmalloc.c or mimalloc). I imagine its just a case of changing a single line in each one. e.g: emscripten/system/lib/emmalloc.c Lines 71 to 74 in a8b489d
I don't think we will be changing the default upstream since using 8-byte alignment has showed to have an impact on some benchmarks. For stack alignment we do use 16 bytes, as does llvm, so that should be fine. |
Emscripten uses a bit different ABI from others. References: emscripten-core/emscripten#14456 https://reviews.llvm.org/D104808
Emscripten uses a bit different ABI from others. References: emscripten-core/emscripten#14456 https://reviews.llvm.org/D104808
By default we now use
alignof(max_align_t)
in bothemmalloc
anddlmalloc
. This is a requirement of the C and C++ standards.Developers can opt into the old behviour using
-s MALLOC=dlmalloc-align8
or-s MALLOC=emmalloc-align8
.Based on #10110 which was authored by @Akaricchi.
Fixes: #10072