aot/jit native stack bound check improvement #2244

yamt · 2023-05-30T10:53:07Z

summary:

Move the native stack overflow check from the caller to the callee because the former doesn't work for call_indirect and imported functions.

Make the stack usage estimation more accurate.
Instead of making a guess from the number of wasm locals in the function, use the LLVM's idea of the stack size of each MachineFunction. The former is inaccurate because a) it doesn't reflect optimization passes and b) wasm locals are not the only reason to use stack.

To use the post-compilation stack usage information without requiring 2-pass compilation or machine-code imm rewrites, introduce a global array to store stack consumption of each functions.
for JIT, use a custom IRCompiler with an extra pass to fill the array.
for AOT, use clang -fstack-usage equivalent instead because we support external llc.

Re-implement function call stack usage estimation to reflect the real calling conventions better.
(aot_estimate_stack_usage_for_function_call)

Re-implement stack estimation logic (--enable-memory-profiling) based on the new machinery.

discussions:
#2105

todo/known issues/open questions:

~~implement 32-bit case~~
~~fill the stack_sizes array for jit (use something similar to experiment to query stack sizes for jit #2216)~~
~~fix jit tier up (or confirm it isn't broken)~~ reading the code, i couldn't find anything broken.
~~ensure appropriate jit partitioning (ensure to compile the function body before executing the corresponding wrapper)~~
~~account caller-side stack consumption (cf "native stack overflow" detection is sometimes inappropriate #2105 (comment))~~
- ~~what to do for native function calls?~~ do nothing special, at least within this PR
~~fix external llc. pass -fstack-usage to the external command?~~
~~re-implement enable_stack_estimation based on the new machinary~~
what to do for RtlAddFunctionTable?
- it seems broken regardless of this PR. but this PR might break it further.
- i'm not even sure how i can test it. is it for AddVectoredExceptionHandler?
- see also: aot_loader RtlAddFunctionTable logic seems to assume a particular order of functions #2242
- references:
  - https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-rtladdfunctiontable
  - https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?redirectedfrom=MSDN#the-pdata-section
test
- ~~this test module consumes about 8MB of stack (wamr aot with llvm 14, amd64) https://github.com/yamt/toywasm/blob/1cc6d551b0fcd10cc8c8b3516c48ba08015e6ad6/wat/many_stack.wat.jinja#L37-L38~~ worked as expected
- ~~investigate assertion failure seen with js.wasm~~
- ~~investigate app heap corruptions seen with aot~~ app heap insertion issues #2275
- non x86 archs
  - xtensa looks ok
  - i have no plan to test others right now
benchmark. noinline can have severe implications for certain type of modules. however, as wasm is usually a compiler target, hopefully fundamental inlining has already been done before aot/jit compilation. aot/jit native stack bound check improvement #2244 (comment)
~~remove/disable debug code~~
~~do something for func_ctx->debug_func. just disable for the wrapper func?~~
~~add missing error checks~~
~~reduce code dup. probably make aot_create_func_context use create_basic_func_context.~~
~~fix errors caused by empty function~~
~~look at x86_32 failure on the ci~~
- ~~Segv aot/jit: set module layout #2260~~
  *Nan related issue: an i32.reinterpret_f32 test in conversions.wast was failing. it's x87 flds/fstp which doesn't preserve sNaN. the problem is not specific to this PR. it seems working on main branch just by luck. (it would fail if you disable optimizations.) i don't think there's a simple way to fix it w/o changing the aot ABI. spec-test-script: disable conversions.wast on i386 #2269

no functional changes are intended.

as inlining can make our stack check almost meaningless.

yamt · 2023-06-19T06:24:31Z

i fixed xtensa case. it's still inefficient, but not broken.
while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

core/iwasm/compilation/aot_llvm.c

core/iwasm/compilation/aot.h

core/iwasm/compilation/aot_compiler.c

core/iwasm/compilation/aot_emit_aot_file.c

also, add a comment

wenyongh

LGTM

yamt · 2023-06-21T02:39:00Z

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

xujuntwt95329

LGTM

wenyongh · 2023-06-21T03:41:05Z

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

OK, it seems there is no comment from other developers, let's merge this PR?

yamt · 2023-06-21T13:52:45Z

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

OK, it seems there is no comment from other developers, let's merge this PR?

i have no problem with it

Move the native stack overflow check from the caller to the callee because the former doesn't work for call_indirect and imported functions. Make the stack usage estimation more accurate. Instead of making a guess from the number of wasm locals in the function, use the LLVM's idea of the stack size of each MachineFunction. The former is inaccurate because a) it doesn't reflect optimization passes, and b) wasm locals are not the only reason to use stack. To use the post-compilation stack usage information without requiring 2-pass compilation or machine-code imm rewriting, introduce a global array to store stack consumption of each functions: For JIT, use a custom IRCompiler with an extra pass to fill the array. For AOT, use `clang -fstack-usage` equivalent because we support external llc. Re-implement function call stack usage estimation to reflect the real calling conventions better. (aot_estimate_stack_usage_for_function_call) Re-implement stack estimation logic (--enable-memory-profiling) based on the new machinery. Discussions: bytecodealliance#2105.

yamt mentioned this pull request Jun 1, 2023

experiment to query stack sizes for jit #2216

Closed

yamt force-pushed the stack-sizes branch 2 times, most recently from 7919527 to f984dfb Compare June 5, 2023 08:05

yamt mentioned this pull request Jun 5, 2023

aot/jit: set module layout #2260

Merged

yamt force-pushed the stack-sizes branch 3 times, most recently from c3594ff to f47dc28 Compare June 6, 2023 13:37

This was referenced Jun 7, 2023

spec-test-script: disable conversions.wast on i386 #2269

Merged

"native stack overflow" detection is sometimes inappropriate #2105

Closed

yamt force-pushed the stack-sizes branch from 9d20ebd to 4337669 Compare June 9, 2023 08:51

yamt added 20 commits June 12, 2023 12:41

aot_llvm.c: create a global to store stack sizes

991834a

aot: update stack_sizes (with a dummy value for now)

11910cf

aot_add_llvm_func: extract llvm function creation logic

32d2213

no functional changes are intended.

aot/llvm-jit: generate "precheck" function wrappers

62ec1b5

aot: debug

0cbd140

mark the function body noinline

22b5d14

as inlining can make our stack check almost meaningless.

aot/jit: separate aot_add_precheck_function

699913d

record stack_sizes global in in comp ctx

f08429e

aot_add_precheck_function: add some IRs (WIP)

3a5ab29

wip

f206aea

wip

07cbe04

revert a few (wrong) constifications

f8a75f6

aot: read stack usage file

f8b6be9

fix a few inverted conditions

d373c01

clang-format

c5d26e4

call correct func

4e0641b

make the precheck func noinline as well

4a7ed3e

debug code

07114a8

retire create_native_stack_bound_from_exec_env

52dea3d

start making the new thing conditional on enable_stack_bound_check

0a56499

yamt added 9 commits June 19, 2023 13:01

aot_estimate_stack_usage_for_function_call: tweak for xtensa

7487102

aot_add_precheck_function: use musttail where it's safe

34fd5b1

export aot_estimate_stack_usage_for_function_call

6475975

move aot_estimate_stack_usage_for_function_call to aot_llvm.c

d082d69

aot_resolve_stack_sizes: fix the estimation w/o tail call optimization

2575b0c

add a warning

b9da93f

jit_stack_size_callback: sync with the aot version

f9c32bb

aot_emit_aot_file.c: downgrade a few scary warnings

9976da8

a comment

04ffbc4

yamt marked this pull request as ready for review June 19, 2023 06:22

whitespace

36bd181

wenyongh reviewed Jun 20, 2023

View reviewed changes

yamt added 5 commits June 20, 2023 16:37

rename aot_func2# -> aot_func_internal#

18ef14e

use bh_memcpy_s

699ebaf

also, add a comment

remove an extra LLVMOrcDisposeThreadSafeModule call

090552c

use uint64 instead of uint64_t

3c0d82d

add an assertion and comments

a338714

wenyongh reviewed Jun 20, 2023

View reviewed changes

xujuntwt95329 approved these changes Jun 21, 2023

View reviewed changes

wenyongh merged commit cd7941c into bytecodealliance:main Jun 21, 2023

no1wudi mentioned this pull request Jun 25, 2023

spec test on nuttx is broken #2312

Closed

loganek mentioned this pull request Aug 1, 2023

LLVM ERROR: failed to perform tail call elimination with MIPS AOT #2412

Closed

loganek mentioned this pull request Aug 11, 2023

Propose a new (1.2.3) release #2378

Merged

yamt mentioned this pull request Aug 14, 2023

aot: disable musttail for mips #2457

Merged

yamt added the native stack overflow detection label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aot/jit native stack bound check improvement #2244

aot/jit native stack bound check improvement #2244

yamt commented May 30, 2023 •

edited

Loading

yamt commented Jun 19, 2023

wenyongh left a comment

yamt commented Jun 21, 2023

xujuntwt95329 left a comment

wenyongh commented Jun 21, 2023

yamt commented Jun 21, 2023

aot/jit native stack bound check improvement #2244

aot/jit native stack bound check improvement #2244

Conversation

yamt commented May 30, 2023 • edited Loading

yamt commented Jun 19, 2023

wenyongh left a comment

Choose a reason for hiding this comment

yamt commented Jun 21, 2023

xujuntwt95329 left a comment

Choose a reason for hiding this comment

wenyongh commented Jun 21, 2023

yamt commented Jun 21, 2023

yamt commented May 30, 2023 •

edited

Loading