-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Assign part of wasm address space to a shared heap to share memory among wasm modules with zero-copying #3546
Comments
Implementation tasks:
|
do you mean to have a single global shared memory, which all wasm modules with a linear memory on the system can fully access? semantically, is the shared region always treated as if it's a shared memory? |
Awesome. At the W3C in person event, Deepti presented on providing a Is it worth while checking in with Deepti, to ensure there is no clashes, kinda wondering if we could end up with the lower chunk of linear memory made available to an |
Yes, in current discussion, it is supposed to only have a global shared heap, each wasm module can access it, and the shared heap is created during runtime initialization. Your idea sounds reasonable, but then runtime should create the shared heap lazily, the working flow may be like below:
For performance consideration, I think we had better restrict that each wasm instance can only associate to one shared heap, or it will be too complex and might greatly impact performance. How do you think?
Yes, the shared region is always mapped to the shared heap, and my suggestion is we always use software boundary check for it since we have to add extra check for which region the wasm addr belongs to, so it should be able to prevent dead load elimination optimization. |
yes. if you want, you can still create the first shared heap on the runtime initialization.
i agree. |
Thanks @woodsmc. Not know how mmap function is used, does it mean that the wasm app can call the mmap function, and runtime maps the mmapped memory to the lower range of wasm memory address space and then change the behavior of wasm load/store accordingly? And can wasm app call mmap function multiple times? If yes, it may impact performance a lot. And not sure why map to lower range of wasm address space: (1) IIUC, the 0 of wasm addr is reserved for the check for C NULL pointer by clang, clang reserves a space from 0 and doesn't put the app's global data at 0, (2) if map to the lower range, then wasm app should reserve a relatively larger space for it, it may be not so convenient for toolchains/developers, at least for clang, developer should add
Yes, it would be great if we discuss more with Deepti. I think we should be able to support both the mmap and the shared heap if needed since one uses the lower range of wasm addr space and the other uses the higher range, but maybe we don't need to support mmap when the shared heap is enabled, since runtime can also use mmap function to allocate the memory for shared heap or even provide callback for developer to allocate the memory. |
I guess that implementing shared memory using either the shared heap or mmap methods will make memory boundary checks more complex. Therefore, I have an idea about boundary checks in #3548. Perhaps it could serve as the basis for implementing shared memory functionality. What do you think? |
@no1wudi yes, maybe we can add another option for wamrc to allow to call runtime API to do boundary check in AOT/JIT mode, but I am not sure whether it is good to make it as default mode for the shared-heap/mmap functionality, had better test the performance to see the result first? |
The performance overhead does need to be tested. What I can confirm is that if boundary checks are implemented using an if-else if sequence in LLVM IR, it will significantly increase the code size. In some of our applications, the code size could double as a result. |
Could this association conform to the principles of a component model in the future? Being able to restrict the accessible area per component (or similar concept) would support a variety of use cases. |
Yes, when component mode is implemented in the future, I think we can also associate part or all of the instances inside the component to a shared heap, and for the latter, we may add an API for the component to associate all its instances to a shared heap. It depends on the requirement. |
Would it be possible to have a module default to using Example use case: |
Do you mean in the runtime's module malloc implementation, when failed to allocate memory from the private heap, runtime continue to allocate memory from the shared heap? |
Sorry, let me try to explain with a concrete example: In the example below, I am calling
|
Hi, a possible way that I can think is to find which object file implements the |
Thank you for the suggestion. Do you have an estimate for when this feature will be available in the main branch? |
About the requirement
Many scenarios require to share memory buffer between two wasm modules without copying data (zero-copying) and there were developers asking the issue. But since the wasm spec assumes that a wasm app can only access data inside its linear memory(ies), it is difficult to achieve that, normally we have to copy data from the caller app's linear memory to the callee app's linear memory to call callee's function. People may use some methods, like multi-memory, GC references, or core module dynamic linking, but there are some limitations, like the support of toolchain, the user experience to write the wasm application, the requirement of advanced wasm features, the footprint and so on. Here we propose a solution for it: assign part of wasm address space to a shared heap to share memory among wasm modules with zero-copying.
Overview of the solution
As we know, there is address mapping/conversion between wasm address space of linear memory and the host address space: for example, in wasm32, the wasm linear memory's address space can be from 0 to linear_mem_size-1, and the max range is [0, 4GB-1], and there is corresponding physical address space for the linear memory allocated by runtime, let's say, from linear_mem_base_addr to linear_mem_base_addr+linear_mem_size-1. The mapping is simple and linear: [0 to linear_mem_size-1] of wasm world <=> [linear_mem_base_addr, linear_mem_base_addr+linear_mem_size-1] of host world. But since in most cases, the max linear memory size is far smaller than 4GB, we can use the higher region of the wasm address space and map it to another runtime managed heap to share memory among wasm modules (and also host native).
The idea is mainly to let runtime create a shared heap for all wasm modules (and host native): all of them can apply/allocate memory from the shared heap and pass the buffer allocated to other wasm modules and host native to access. And the allocated buffer is mapped into the higher region of the wasm address space: in wasm32 the address space (or we often call it offset) for a wasm app is from 0 to 4GB-1 (which is relative address but not native absolute address), suppose the wasm app's linear memory doesn't use all the space (it uses 0 to linear_mem_size-1 and normally linear_mem_size is far smaller than 4GB), then runtime can use the higher region for the shared heap and map the shared heap's native address space into the region, for example, from 4GB - shared_heap_size to 4GB -1. And runtime does a hack when executing the wasm load/store opcodes: if the offset to access is in the higher region (from 4GB - shared_heap_size to 4GB -1), then runtime converts the offset into the native address in the shared heap to access, else runtime converts the offset to the native address in the wasm app's private linear memory to access. Since the wasm address space of the higher region is the same for all wasm modules and runtime accesses the higher region with same way, a wasm module can pass the buffer inside it to another wasm module, so as to share the data with zero-copying.
And runtime provides APIs to allocate/free memory from the shared heap, e.g. a wasm app can import function like
(env, shared_malloc)
and(env, shared_free)
can call it, the import functions are implemented by runtime. For host native, runtime may provide API likewasm_runtime_shared_malloc
andwasm_runtime_shared_free
. And the shared heap size can be specified by developer during runtime initialization.From the view of wasm app, it has two separated address regions, and it is not a standard behavior of the wasm spec, but it doesn't break the wasm sandbox since the memory access boundary checks can be applied for both the two regions. There is a performance penalty since additional boundary checks should be added for the higher region, but I think it should be relatively small and should be acceptable compared to copying buffer mode.
Eventually, when a wasm app wants to share a buffer to another wasm app, the code may be like:
Main changes
wamrc --enable-shared-heap
flagDefPointer(const uint32 *, stack_sizes)
, and the layout is pre-known by aot compiler during compilation timewasm_runtime_shared_malloc
/wasm_runtime_shared_free
shared_malloc
/shared_free
to allocate/free memory from/to the shared heapelse keep same as original (both for hw bound check and non hw bound check)
Others
The text was updated successfully, but these errors were encountered: