-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature suggestion (memcpy/memset) #236
Comments
Aren't memcpy / memset / memmove usually implemented at an higher level, ie in the C stdlib? Example implememtation in XNU: http://www.opensource.apple.com/source/xnu/xnu-2050.18.24/libsyscall/wrappers/memcpy.c |
In emscripten, |
Interesting thought. You might be able to rely on the wasm engine recognizing this pattern and optimizing the loop/removing bound checks (our engine is starting to do some optimizations when it recognizes the "memcpy" pattern in js). And when we have SIMD support we can implement memcpy pretty efficiently in wasm without as much reliance on optimizations from the wasm engine. To this end, it would be interesting to see how much SIMD helps memcpy in asm.js. I think this should be less bloated in wasm (even in asm.js I think we could probably just not inline memcpy and clear up the code bloat -- I'd think in most cases inlining them won't really improve perf because call overhead would be dwarfed by the actual copy). But it might be worth considering because it could help perf for baseline compiler (which might not have BCE), and it could reduce AST size (obviously important). |
@MikeHolman The issue with not inlining in emscripten is that various patterns of code get converted to |
WebAssembly does have fewer pain points than asm.js in this area, but there would still be advantages to adding these. For memcpy, as opposed to memmove, it's not clear that introducing nondeterminism for the operlapping case is worth the minor performance difference, and we can always add it in the future if we wish. |
Agreed on adding
How should we define intrinsics in WebAssembly is the next question. Unresolved external functions with a special name? That's getting dangerously close to having some limited ABI, but I'm OK in this case. |
An improvement to the idea could be memcpy4/memcpy8/memcpy16 that only allows a multiple of 4/8/16 bytes to be copied, so some checks could be skipped. For example when using SIMD instructions for copying. |
@jfbastien If we add new builtin operations, we should just add them as plain AST node kinds. I don't think we need an instruction/intrinsic distinction as LLVM has. @lskyum I agree that some provision for copying with known size multiples, and known alignments too for that matter, is worth considering. |
@sunfishcode AST nodes sound good. On known size / alignment: what's the behavior when the code was incorrect? Force realign+overcopy, trap, implementation defined? |
For the size multiples, one possibility would be to have a scale operand, possibly required to be a literal, that the size operand would be implicitly shifted by. For alignment, perhaps we could just do what load+store do and say "it's really slow" if the given alignment is wrong. |
I think memmove would be OK, but I would prefer not to have many variants On Mon, Jun 29, 2015 at 7:26 PM, Dan Gohman [email protected]
|
Just so it won't be forgotten. A memset (or memclear) would be usefull for implementing a C#/Java VM because all objects and arrays are initialized to 0/null on creation. |
I am contemplating a proposal as follows:
Semantics are hopefully obvious :-). With >4GiB heaps, 64-bit versions would be added too. I'm now leaning away from size/alignment hints for now because I don't think they're the most critical, and there's value in starting simple. Return value is the value of |
I understand the concerns with nondeterminism, but I can't help thinking that memcpy would be usefull for copying structs around without having to do overlap checking. I think it could replace many load/store operations and allow them to be optimized as well - especially when the size is a constant. |
Checks for overlapping memory regions probably aren't that expensive in On Tue, Jul 28, 2015 at 8:20 AM, lskyum [email protected] wrote:
|
On Tue, Jul 28, 2015 at 5:48 AM, Dan Gohman [email protected]
|
@titzer allright I'll take your word for it :-) |
The overlap check wouldn't be expensive relative to the copy? If we're talking about structs, they might be 12, 16, 20, 24 bytes... |
@titzer Yes, the 32 is for 32-bit operands. All sizes and addresses are in bytes here. @qwertie Small known-size known-alignment copies like that can be done with plain loads and stores (and no overlap check) rather than memmove at the wasm level, when speed is a concern. Also, I believe the door is still open to adding size, alignment, and perhaps even aliasing hints in the future. I just don't want to block solving the medium/large cases on design and agreement on the small/partially-known cases. |
@sunfishcode: I find the Agreed with @sunfishcode: small copies should be generated by the compiler as something else than @sunfishcode: add |
I would actually strongly advocate in favor of arbitrary-sized copies being emitted as memmove with constant sizes instead of arbitrary sets of loads/stores. That lets the runtime intelligently figure out the best way to copy N bytes. The alternative has proven to be a hellpit in JavaScript. I think it will pretty much always be smaller on the wire and less work to decode, also. |
@jfbastien The 32 is just meant to leave namespace room for a 64 version in the future. I'm open to suggestions. @jfbastien |
@kg Would you be ok if we just do this simple memmove/memclear proposal first, get consensus that we should do something first, and then extend it with size/alignment/aliasing/locality hints in a follow-up issue? |
Yes, punting on it is totally fine. Just responding to |
|
@jfbastien And @kg Right, if you have a constant size, then you just need alignment of the source, alignment of the destination (should we have separate hints, or one for both?) and maybe also non-overlappingness of source and destination. |
@sunfishcode that's a pretty strawman argument... |
I've been looking at perf profiles for wasm unity benchmark a bit recently and see that some of the hottest functions are doing memcpy or memset like things. If this is any indication of normal wasm code patterns, I think we could see significant improvement with an intrinsic so it may be worth prioritizing. |
Yes, we've seen that in several cases as well so agreed on prioritizing for next batch of features. |
Addressed by #1057. |
First of all - sorry if I missed it in the spec!
The suggestion is for simple memcpy/memset instructions, so copying/clearing can be done at maximum speed for any given platform/browser. For example bounds-checking doesn't have to be done pr. read/write, but only once. (Could be an issue on some implementations I think. Maybe also the polyfill...)
The memcpy could be provided in two versions, one that checks for overlaps and one that doesn't.
The text was updated successfully, but these errors were encountered: