Panic on stack overflow #91

jclark · 2021-06-20T01:15:58Z

At the moment, we seg fault on stack overflow.

We ought to detect this and call the runtime panic function.

I think LLVM has some stuff to help with this.

KavinduZoysa · 2021-06-21T17:49:19Z

For our knowledge, this is how rust implemented this.

Rust detects stack overflow using SIGSEGV. When SIGSEGV was detected, we cannot say that the root cause is stack overflow because stack overflow is not the only cause to SIGSEGV(https://en.wikipedia.org/wiki/Segmentation_fault). So when SIGSEGV was detected, their signal handler checks whether the address lies within the stack guard page.

jclark · 2021-06-21T22:01:51Z

Lots of non-portable, low-level code there. It would be nice if we could do something more straightforward for now.

KavinduZoysa · 2021-06-22T18:25:26Z

@jclark, Please check the below approach with a working sample and add your thoughts/improvements. Sample codes can be found here.

In this approach, we can detect the stack overflow by comparing the remaining stack size with the stack frame size we are going to allocate. I have written an equivalent C code for this approach. I have taken the address of the beginning of the stack using a dummy variable and this may not be the best way to do it. I wrote this just to verify the approach.

Then, I have written this LLVM IR by looking at above equivalent C code. To get the starting address of the stack frame I have used the llvm.frameaddress intrinsic. Here I have hardcoded STACK_FRAME_SIZE as 10 bytes, but ideally, it should be the stack frame size of the function we are going to check. This can be taken using LLVM stack map intrinsic. I worked on generating and parsing the LLVM stack map when I was exploring garbage collectors. Using this we can get the stack frame size of function and use it instead of the STACK_FRAME_SIZE constant.

jclark · 2021-06-23T00:28:49Z

Stepping back a bit, the approach is:

in the runtime's main, we identify the stack limit (ie a pointer to the end of the stack) and store it in a global
at the beginning of each function, we panic if there's a possibility that this function call will exceed the stack limit (except in so far as it calls other functions)

I like this approach. On details:

stack limit (we can have a system dependent function in the runtime to return this)
1. stacks can grow up or down and LLVM supports both; not sure if they grow up on any system we care about
2. on Linux, execve uses the stack to store argv/argc/envp (both the pointers and the strings), so a local variable in main isn't the beginning of the stack; it's the beginning of the stack + the space used by execve
3. we can use rlimit to get the stack size
for the function's stack usage we need to account for:
1. local variables we alloca
2. space we use to store arguments passed to other Ballerina functions
3. stack space used by C runtime functions, where we don't want to do an additional check
we need to be careful not to mess up LLVM's mem2reg optimization
1. I think I've read that mem2reg only works for alloca's in the the prologue
2. if we do the stack check before the alloca, then we need to make sure that it doesn't stop the alloca's from being considered as being in the prologue (I guess that means the first basic block) - maybe you can do an experiment here

A general point is that we do not need to be able to use every byte of available stack space. We want things to be simple and efficient and 100% guarantee that we never use more than the limit. We should not get into trying to exactly calculate the stack usage of a function: rather we should compute an upper limit (i.e. a value X such that we sure that it the stack usage won't be more than X). For example, if the number of BIR registers is r, can we make the upper limit be X*r + Y for some X and Y?

jclark · 2021-06-23T01:09:46Z

We should also be careful we do not do anything that inhibits LLVM inlining.

jclark · 2021-06-23T02:56:17Z

We can divide between C and LL as follows:

C runtime exports global char *_bal_stack_limit for use by LL code
C runtime has system dependent function setStackLimit(argc,argv,envp), which is called by main before calling _B_main
LL code tests whether current stack pointer < _bal_stack_limit + something TBD, and calls _bal_panic if it is

I wonder whether llvm.frameaddress can be relied on.

jclark · 2021-06-23T03:07:11Z

Eventually we will want to be able to grow stacks, so strands can start off with small stacks. Here is what Go does

https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub

jclark · 2021-06-23T03:16:43Z

Have a look at the section "// go:nosplit" in this

https://dave.cheney.net/2018/01/08/gos-hidden-pragmas

where it shows the code the Go inserts in its prolog to check for stack overflow

KavinduZoysa · 2021-06-23T03:22:10Z

Thank you @jclark, I will go through these details and come back.

jclark · 2021-06-23T03:27:04Z

I suggest looking at how llgo (Go on LLVM) deals with stack overflow

https://github.com/llvm/llvm-project/tree/release/10.x/llgo

I couldn't find the relevant code.

KavinduZoysa · 2021-06-23T03:28:43Z

Sure, I will check that.

jclark · 2021-06-23T03:29:35Z

Also see

https://lists.llvm.org/pipermail/llvm-dev/2013-September/065333.html

It seems like a good start would be to add up all the space allocated by the alloca instructions (which we can do easily).

jclark · 2021-06-23T03:51:37Z

I wonder if this will work: after doing all the allocas, but before spilling the parameters into stack, we check whether the address of the last alloca is beyond the limit: we also make sure that there's some space between _bal_stack_limit and the real stack limit, to allow for anything that LLVM adds itself.

manuranga · 2021-06-23T05:07:08Z

Since LLVM alloca instruction can allocate dynamic arrays (even though we are using it to allocate statically sized things), I don't think there will be a way to get the current stack frame size statically from LLVM.
In the initial version, we should do something simple that doesn't involve finding frame sizes. eg

if (stack_size - currently_used < 1 kb) {
   panic_stackoverflow();
}

If it messes with the LLVM optimization, we might have to do this as a LLVM pass.

KavinduZoysa · 2021-06-24T01:44:52Z

package main

func main() {
	foo()
}

func foo() {
	a := 5
	_ = a
	b := 6
	_ = b
	c := 7
	_ = c
}

This is the output of llvm-goc.

define void @main.main(i8* nest %nest.0) #0 !dbg !14 {
entry:
  call void @main.foo(i8* nest undef), !dbg !15
  ret void
}

define void @main.foo(i8* nest %nest.1) #0 !dbg !16 {
entry:
  %a = alloca i64, align 8
  %b = alloca i64, align 8
  %c = alloca i64, align 8
  %0 = bitcast i64* %a to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0)
  %1 = bitcast i64* %b to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %1)
  %2 = bitcast i64* %c to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %2)
  store i64 5, i64* %a, align 8
  call void @llvm.dbg.declare(metadata i64* %a, metadata !17, metadata !DIExpression()), !dbg !20
  %a.ld.0 = load i64, i64* %a, align 8, !dbg !21
  store i64 6, i64* %b, align 8
  call void @llvm.dbg.declare(metadata i64* %b, metadata !22, metadata !DIExpression()), !dbg !23
  %b.ld.0 = load i64, i64* %b, align 8, !dbg !24
  store i64 7, i64* %c, align 8
  call void @llvm.dbg.declare(metadata i64* %c, metadata !25, metadata !DIExpression()), !dbg !26
  %c.ld.0 = load i64, i64* %c, align 8, !dbg !27
  %3 = bitcast i64* %a to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %3)
  %4 = bitcast i64* %b to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %4)
  %5 = bitcast i64* %c to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %5)
  ret void
}

KavinduZoysa · 2021-06-24T05:19:39Z

I did a test to verify the output of llvm.frameaddress and compare its' output with the address of the first local var in function. Sample codes can be found here.

First I have written a C code to print the address of the first local var.
Then I generated the llvm IR for that and modified it. Modifications done :

Add llvm.frameaddress intrinsic
Add printf to print the about output

Then I ran the llvm IR with enabling debug(This is done to debug using gdb)

So the output is something like this.

0x7fffffffde8c <---- address of local var (alloc)
0x7fffffffde90 <---- output of llvm.frameaddress

Ideally, the frame address should be the address of the base pointer(rbp in x86) of a given frame. Because the base pointer is pointing to the beginning of the stack frame. By looking at the generated assmbly we can verify that `llvm.frameaddress' gives us the address of rbp.

Also, I have debugged the code using gdb to get the base pointer.

@jclark, what do you think about this?

jclark · 2021-06-24T05:27:20Z

@KavinduZoysa How does this work with function inlining? We want to compare using the address from alloca vs using llvm.frameaddress. I am not clear which is more appropriate for our use. If there are multiple allocas, is the address of the last at the end of the stack (i.e. bottom if growing down)?

KavinduZoysa · 2021-06-24T11:37:02Z

When we consider function inlining,

llvm.frameaddress: This intrinsic gives us the frame address of the function after inlining happens. In the above example, let's say foo was inlined inside main. Then this intrinsic provides us the frame address of main.
address from alloca: If we consider the first address of the variables in foo, when inlining, those variables are allocated after the allocation of the main function's variables.

I have compared several llvm IRs with assembly codes that have multiple allocas. In all cases, the difference between the base pointer and stack pointer is greater than the sum of allocas. According to that, we cannot say the address of the last is the end of the stack.

jclark · 2021-06-24T11:59:51Z

In all cases, the difference between the base pointer and stack pointer is greater than the sum of allocas.

Yes. The frame size for a function F needs to include at least

all the allocas of F,
the maximum of the space needed for the arguments of each function that F calls
space to save registers

What else is there? I'm not too worried about 3, because a fixed amount of space can deal with this. Question is: is there anything other than 1 and 2 that can cause unbounded increase in frame size?

KavinduZoysa · 2021-06-24T18:10:54Z

The difference between llvm.frameaddress and address from alloca depends on the size of the first alloca. If the size of alloca is i64, then the difference is 8 bytes, if it is i32 then the difference is 4 bytes.

Question is: is there anything other than 1 and 2 that can cause unbounded increase in frame size?

Yes, I also did not found any other reason for unbounded increase in frame size. I will check more on that.

jclark · 2021-06-24T22:23:49Z

Given the point about llvm.frameaddress and inlining, I don't think llvm.frameaddress is going to work well for stack overflow checking, because we cannot estimate our frame size in a way that's consistent with llvm.frameaddress when inlining has been done. Of course, we could add a LLVM pass to insert these checks, but then we don't need llvm.frameaddress. So it's clear to me that at this stage we should use an alloca address as the basis for stack checking.

So we are going to insert a check at the beginning of the function that does the equivalent of:

extern char *stack_guard;
if (address_from_alloca < stack_guard) panic(STACK_OVERFLOW);

The remaining question is: where can we put this check without negatively impacting LLVM optimization specifically mem2reg and function inlining?

KavinduZoysa · 2021-06-25T04:10:44Z

The remaining question is: where can we put this check without negatively impacting LLVM optimization specifically mem2reg and function inlining?

I am working on this, I will update as soon as possible.

extern char *stack_guard;
if (address_from_alloca < stack_guard) panic(STACK_OVERFLOW);

Just to verify this, address_from_alloca is the address of the last alloca for a given function and stack_guard is the stack_limit + CONSTANT memory size. Please correct me if I am wrong.

jclark · 2021-06-25T06:07:43Z

Just to verify this, address_from_alloca is the address of the last alloca for a given function and stack_guard is the stack_limit + CONSTANT memory size. Please correct me if I am wrong.

Eventually, I would expect address_from_alloca to be the first alloca of the function minus an estimate of the frame size (or the last plus an estimate of space used for arguments of called functions); stack_guard to be as you say.

For now, I would expect stack_guard to be the address of a local variable in main minus a constant for the allowed size of the stack (e.g. a megabyte) and address_from_alloca to be the first or last alloca of the function (doesn’t really matter with that stack_guard).

KavinduZoysa · 2021-06-25T07:31:30Z

I have run a couple of tests to check the impact of mem2reg optimization. If we have used a particular alloca as shown below then those allocas will not be removed after mem2reg optimization.

before mem2reg:

define dso_local i32 @bar() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  ;; Read the 2nd address and compare
  %"%100" = bitcast i32* %2 to i8*
  %"%101" = load i8*, i8** @stak_guard 
  %"%102" = icmp slt i8* %"%100", %"%101"
  %"%103" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i1 %"%102")

  store i32 5, i32* %1, align 4
  store i32 2, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = add nsw i32 %3, 1
  store i32 %4, i32* %2, align 4
  %5 = load i32, i32* %1, align 4
  %6 = add nsw i32 %3, %5
  ret i32 %6
}

after mem2reg:

define dso_local i32 @bar() #0 {
  %1 = alloca i32, align 4
  ;; Remains the alloca which is needed
  %"%100" = bitcast i32* %1 to i8*
  %"%101" = load i8*, i8** @stak_guard, align 8
  %"%102" = icmp slt i8* %"%100", %"%101"
  %"%103" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i1 %"%102")

  store i32 2, i32* %1, align 4
  %2 = load i32, i32* %1, align 4
  %3 = add nsw i32 %2, 1
  store i32 %3, i32* %1, align 4
  %4 = add nsw i32 %2, 5
  ret i32 %4
}

When we consider the function inlining, the callee function's allocas are merged with callers allocas. But the alloca we used to get address_from_alloca will be there in callers after inlining.

manuranga · 2021-06-25T08:06:58Z

Since are not planing to be precise at this stage, wouldn't it be better to do:

extern char *stack_guard;
if (llvm.frameaddress  < stack_guard) panic(STACK_OVERFLOW);

It will not mess with any optimizations. From the llvm docs:

Note that calling this intrinsic does not prevent function inlining or other aggressive transformations, so the value returned may not be that of the obvious source-language caller.

Check will be redundant when inlined but not incorrect.

jclark · 2021-06-25T09:26:07Z

@KavinduZoysa What does the assembly look like for that example?

@manuranga llvm.frameaddress is a step in the wrong direction: I can’t see any way to make an approach based in llvm.frameaddress robust. Using the address of an alloca, we can make it robust by putting an upper bound on stack usage based on number of allocas and number of arguments passed to called functions. Optimization cost is not yet clear. Need to see assembly and try taking the address of the last alloca.

jclark · 2021-06-25T09:27:48Z

I think the C equivalent of llvm.frameaddress __builtin_frame_address is useful in main. Taking the address of a stack variable and adding to it is undefined behaviour in C.

KavinduZoysa · 2021-06-25T10:30:17Z

@KavinduZoysa What does the assembly look like for that example?

Please find the below assembly for the checking code above example.

before mem2reg:

0000000000401130 <bar>:
  401130:	55                      push   %rbp
  401131:	48 89 e5             	mov    %rsp,%rbp
  401134:	48 83 ec 10          	sub    $0x10,%rsp
  401138:	48 8d 45 f8          	lea    -0x8(%rbp),%rax // 2nd alloca (last one)
  40113c:	48 3b 04 25 40 40 40 	cmp    0x404040,%rax

after mem2reg:

0000000000401130 <bar>:
  401130:	55                   	push   %rbp
  401131:	48 89 e5                mov    %rsp,%rbp
  401134:	48 83 ec 10          	sub    $0x10,%rsp
  401138:	48 8d 45 fc          	lea    -0x4(%rbp),%rax // 1st alloca (last one)
  40113c:	48 3b 04 25 40 40 40 	cmp    0x404040,%rax

After mem2reg optimization, since only one alloca is left, that is taken to compare.

jclark · 2021-06-26T02:14:12Z

What I implemented was to do use an extra alloca for the compare against stack_guard. After optimization, this seems to end up pretty much the same as frameaddress in the presence, so it's not as robust as I would like. I will create a separate issue to improve this.

jclark assigned KavinduZoysa Jun 20, 2021

jclark added this to the Implement subset 2 of the language milestone Jun 20, 2021

This was referenced Jun 24, 2021

Explore on llvm.lifetime.* intrinsics on allocas #126

Open

Add support for global variables in nback.llvm #127

Closed

jclark closed this as completed in 7c458c3 Jun 26, 2021

jclark mentioned this issue Jun 26, 2021

Improve robustness of stack overflow detection #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on stack overflow #91

Panic on stack overflow #91

jclark commented Jun 20, 2021

KavinduZoysa commented Jun 21, 2021

jclark commented Jun 21, 2021

KavinduZoysa commented Jun 22, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

KavinduZoysa commented Jun 23, 2021

jclark commented Jun 23, 2021

KavinduZoysa commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021 •

edited

Loading

manuranga commented Jun 23, 2021

KavinduZoysa commented Jun 24, 2021 •

edited

Loading

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 25, 2021

jclark commented Jun 25, 2021

KavinduZoysa commented Jun 25, 2021

manuranga commented Jun 25, 2021

jclark commented Jun 25, 2021

jclark commented Jun 25, 2021

KavinduZoysa commented Jun 25, 2021 •

edited

Loading

jclark commented Jun 26, 2021

Panic on stack overflow #91

Panic on stack overflow #91

Comments

jclark commented Jun 20, 2021

KavinduZoysa commented Jun 21, 2021

jclark commented Jun 21, 2021

KavinduZoysa commented Jun 22, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021

KavinduZoysa commented Jun 23, 2021

jclark commented Jun 23, 2021

KavinduZoysa commented Jun 23, 2021

jclark commented Jun 23, 2021

jclark commented Jun 23, 2021 • edited Loading

manuranga commented Jun 23, 2021

KavinduZoysa commented Jun 24, 2021 • edited Loading

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 24, 2021

jclark commented Jun 24, 2021

KavinduZoysa commented Jun 25, 2021

jclark commented Jun 25, 2021

KavinduZoysa commented Jun 25, 2021

manuranga commented Jun 25, 2021

jclark commented Jun 25, 2021

jclark commented Jun 25, 2021

KavinduZoysa commented Jun 25, 2021 • edited Loading

jclark commented Jun 26, 2021

jclark commented Jun 23, 2021 •

edited

Loading

KavinduZoysa commented Jun 24, 2021 •

edited

Loading

KavinduZoysa commented Jun 25, 2021 •

edited

Loading