-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aot/jit native stack bound check improvement #2244
Conversation
7919527
to
f984dfb
Compare
c3594ff
to
f47dc28
Compare
no functional changes are intended.
as inlining can make our stack check almost meaningless.
i fixed xtensa case. it's still inefficient, but not broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lightly tested on esp32-devkitc. it worked as expected so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
OK, it seems there is no comment from other developers, let's merge this PR? |
i have no problem with it |
Move the native stack overflow check from the caller to the callee because the former doesn't work for call_indirect and imported functions. Make the stack usage estimation more accurate. Instead of making a guess from the number of wasm locals in the function, use the LLVM's idea of the stack size of each MachineFunction. The former is inaccurate because a) it doesn't reflect optimization passes, and b) wasm locals are not the only reason to use stack. To use the post-compilation stack usage information without requiring 2-pass compilation or machine-code imm rewriting, introduce a global array to store stack consumption of each functions: For JIT, use a custom IRCompiler with an extra pass to fill the array. For AOT, use `clang -fstack-usage` equivalent because we support external llc. Re-implement function call stack usage estimation to reflect the real calling conventions better. (aot_estimate_stack_usage_for_function_call) Re-implement stack estimation logic (--enable-memory-profiling) based on the new machinery. Discussions: bytecodealliance#2105.
summary:
Move the native stack overflow check from the caller to the callee because the former doesn't work for call_indirect and imported functions.
Make the stack usage estimation more accurate.
Instead of making a guess from the number of wasm locals in the function, use the LLVM's idea of the stack size of each MachineFunction. The former is inaccurate because a) it doesn't reflect optimization passes and b) wasm locals are not the only reason to use stack.
To use the post-compilation stack usage information without requiring 2-pass compilation or machine-code imm rewrites, introduce a global array to store stack consumption of each functions.
for JIT, use a custom IRCompiler with an extra pass to fill the array.
for AOT, use
clang -fstack-usage
equivalent instead because we support external llc.Re-implement function call stack usage estimation to reflect the real calling conventions better.
(
aot_estimate_stack_usage_for_function_call
)Re-implement stack estimation logic (
--enable-memory-profiling
) based on the new machinery.discussions:
#2105
todo/known issues/open questions:
implement 32-bit casefill the stack_sizes array for jit (use something similar to experiment to query stack sizes for jit #2216)fix jit tier up (or confirm it isn't broken)reading the code, i couldn't find anything broken.ensure appropriate jit partitioning (ensure to compile the function body before executing the corresponding wrapper)account caller-side stack consumption (cf "native stack overflow" detection is sometimes inappropriate #2105 (comment))what to do for native function calls?do nothing special, at least within this PRfix external llc. pass -fstack-usage to the external command?re-implementenable_stack_estimation
based on the new machinarythis test module consumes about 8MB of stack (wamr aot with llvm 14, amd64) https://github.com/yamt/toywasm/blob/1cc6d551b0fcd10cc8c8b3516c48ba08015e6ad6/wat/many_stack.wat.jinja#L37-L38worked as expectedinvestigate assertion failure seen with js.wasminvestigate app heap corruptions seen with aotapp heap insertion issues #2275benchmark.aot/jit native stack bound check improvement #2244 (comment)noinline
can have severe implications for certain type of modules. however, as wasm is usually a compiler target, hopefully fundamental inlining has already been done before aot/jit compilation.remove/disable debug codedo something for func_ctx->debug_func. just disable for the wrapper func?add missing error checksreduce code dup. probably make aot_create_func_context use create_basic_func_context.fix errors caused by empty functionlook at x86_32 failure on the ciSegv aot/jit: set module layout #2260*
Nan related issue: ani32.reinterpret_f32
test in conversions.wast was failing. it's x87 flds/fstp which doesn't preserve sNaN. the problem is not specific to this PR. it seems working on main branch just by luck. (it would fail if you disable optimizations.) i don't think there's a simple way to fix it w/o changing the aot ABI. spec-test-script: disable conversions.wast on i386 #2269