-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic on stack overflow #91
Comments
For our knowledge, this is how rust implemented this. Rust detects stack overflow using |
Lots of non-portable, low-level code there. It would be nice if we could do something more straightforward for now. |
@jclark, Please check the below approach with a working sample and add your thoughts/improvements. Sample codes can be found here. In this approach, we can detect the stack overflow by comparing the remaining stack size with the stack frame size we are going to allocate. I have written an equivalent C code for this approach. I have taken the address of the beginning of the stack using a dummy variable and this may not be the best way to do it. I wrote this just to verify the approach. Then, I have written this LLVM IR by looking at above equivalent C code. To get the starting address of the stack frame I have used the llvm.frameaddress intrinsic. Here I have hardcoded STACK_FRAME_SIZE as 10 bytes, but ideally, it should be the stack frame size of the function we are going to check. This can be taken using LLVM stack map intrinsic. I worked on generating and parsing the LLVM stack map when I was exploring garbage collectors. Using this we can get the stack frame size of function and use it instead of the STACK_FRAME_SIZE constant. |
Stepping back a bit, the approach is:
I like this approach. On details:
A general point is that we do not need to be able to use every byte of available stack space. We want things to be simple and efficient and 100% guarantee that we never use more than the limit. We should not get into trying to exactly calculate the stack usage of a function: rather we should compute an upper limit (i.e. a value X such that we sure that it the stack usage won't be more than X). For example, if the number of BIR registers is r, can we make the upper limit be X*r + Y for some X and Y? |
We should also be careful we do not do anything that inhibits LLVM inlining. |
We can divide between C and LL as follows:
I wonder whether llvm.frameaddress can be relied on. |
Eventually we will want to be able to grow stacks, so strands can start off with small stacks. Here is what Go does https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub |
Have a look at the section "// go:nosplit" in this https://dave.cheney.net/2018/01/08/gos-hidden-pragmas where it shows the code the Go inserts in its prolog to check for stack overflow |
Thank you @jclark, I will go through these details and come back. |
I suggest looking at how llgo (Go on LLVM) deals with stack overflow https://github.com/llvm/llvm-project/tree/release/10.x/llgo I couldn't find the relevant code. |
Sure, I will check that. |
Also see https://lists.llvm.org/pipermail/llvm-dev/2013-September/065333.html It seems like a good start would be to add up all the space allocated by the alloca instructions (which we can do easily). |
I wonder if this will work: after doing all the allocas, but before spilling the parameters into stack, we check whether the address of the last alloca is beyond the limit: we also make sure that there's some space between _bal_stack_limit and the real stack limit, to allow for anything that LLVM adds itself. |
|
This is the output of llvm-goc.
|
I did a test to verify the output of First I have written a C code to print the address of the first local var.
Then I ran the llvm IR with enabling debug(This is done to debug using gdb) So the output is something like this.
Ideally, the frame address should be the address of the base pointer(rbp in x86) of a given frame. Because the base pointer is pointing to the beginning of the stack frame. By looking at the generated assmbly we can verify that `llvm.frameaddress' gives us the address of rbp. Also, I have debugged the code using gdb to get the base pointer. @jclark, what do you think about this? |
@KavinduZoysa How does this work with function inlining? We want to compare using the address from alloca vs using llvm.frameaddress. I am not clear which is more appropriate for our use. If there are multiple allocas, is the address of the last at the end of the stack (i.e. bottom if growing down)? |
When we consider function inlining,
I have compared several llvm IRs with assembly codes that have multiple allocas. In all cases, the difference between the base pointer and stack pointer is greater than the sum of allocas. According to that, we cannot say the address of the last is the end of the stack. |
Yes. The frame size for a function F needs to include at least
What else is there? I'm not too worried about 3, because a fixed amount of space can deal with this. Question is: is there anything other than 1 and 2 that can cause unbounded increase in frame size? |
The difference between llvm.frameaddress and address from alloca depends on the size of the first alloca. If the size of alloca is i64, then the difference is 8 bytes, if it is i32 then the difference is 4 bytes.
Yes, I also did not found any other reason for unbounded increase in frame size. I will check more on that. |
Given the point about llvm.frameaddress and inlining, I don't think llvm.frameaddress is going to work well for stack overflow checking, because we cannot estimate our frame size in a way that's consistent with llvm.frameaddress when inlining has been done. Of course, we could add a LLVM pass to insert these checks, but then we don't need llvm.frameaddress. So it's clear to me that at this stage we should use an alloca address as the basis for stack checking. So we are going to insert a check at the beginning of the function that does the equivalent of:
The remaining question is: where can we put this check without negatively impacting LLVM optimization specifically mem2reg and function inlining? |
I am working on this, I will update as soon as possible.
Just to verify this, |
Eventually, I would expect address_from_alloca to be the first alloca of the function minus an estimate of the frame size (or the last plus an estimate of space used for arguments of called functions); stack_guard to be as you say. For now, I would expect stack_guard to be the address of a local variable in main minus a constant for the allowed size of the stack (e.g. a megabyte) and address_from_alloca to be the first or last alloca of the function (doesn’t really matter with that stack_guard). |
I have run a couple of tests to check the impact of mem2reg optimization. If we have used a particular alloca as shown below then those allocas will not be removed after mem2reg optimization. before mem2reg:
after mem2reg:
When we consider the function inlining, the callee function's allocas are merged with callers allocas. But the alloca we used to get |
Since are not planing to be precise at this stage, wouldn't it be better to do:
|
@KavinduZoysa What does the assembly look like for that example? @manuranga llvm.frameaddress is a step in the wrong direction: I can’t see any way to make an approach based in llvm.frameaddress robust. Using the address of an alloca, we can make it robust by putting an upper bound on stack usage based on number of allocas and number of arguments passed to called functions. Optimization cost is not yet clear. Need to see assembly and try taking the address of the last alloca. |
I think the C equivalent of llvm.frameaddress __builtin_frame_address is useful in main. Taking the address of a stack variable and adding to it is undefined behaviour in C. |
Please find the below assembly for the checking code above example. before mem2reg:
after mem2reg:
After mem2reg optimization, since only one alloca is left, that is taken to compare. |
What I implemented was to do use an extra alloca for the compare against stack_guard. After optimization, this seems to end up pretty much the same as frameaddress in the presence, so it's not as robust as I would like. I will create a separate issue to improve this. |
At the moment, we seg fault on stack overflow.
We ought to detect this and call the runtime panic function.
I think LLVM has some stuff to help with this.
The text was updated successfully, but these errors were encountered: