-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match expressions use O(n) stack space with n branches in debug mode #34283
Comments
I have investigated this somewhat and have found that the stack usage is from register spilling. For some reason LLVM is spilling the registers to a separate location in each branch around the overflow check. I have no idea why it might be doing this though. Note that without the |
/cc @rust-lang/compiler |
Oh interesting, yeah maybe it only happens when there are early exits inside a match expression. Removing my heavy use of |
I would expect that this may be due to the specifics of the lifetimes that we emit for the bindings. |
Oh, never mind, just saw @Aatch's comment. |
@Aatch seems like the fast register allocator spills all live registers at the end of each basic block. |
I suspect this may have far reaching implications making code worse all over the place. For example Stylo is full of very large match expressions and I wonder if that limitation is making it unnecessarily bloated. I may be completely wrong though, given @Aatch's comment about Cc @rust-lang/wg-codegen |
Triage (1.44.1) We definitely hit this in debug in rust-analyzer. Our code for expression lowering is a single giant recursive match, and it uses 20k of stack space per recursion level in debug if all branches are there. If I comment out all the branches which are not used in my specific test, stack space usage goes down dramatically. In release, I think I am seeing stack space proportional to the max branch, as commenting branches doesn't make that huge a difference with See rust-lang/rust-analyzer@12d52a7 for a real-world example of this. |
Maybe @rust-lang/wg-mir-opt can do something to clean up the match logic to make it easier for llvm to figure out the register spilling correctly. |
The problem here is having live values across BB boundaries, because the register allocator in debug mode simply spills and reloads everything, even for unconditional branches. Silly example: define internal i8 @testcase(i8 %0) {
br label %bb2
bb2:
ret i8 %0
} becomes: testcase: # @testcase
.cfi_startproc
# %bb.0:
# kill: def $dil killed $dil killed $edi
movb>-%dil, -1(%rsp) # 1-byte Spill
jmp>.LBB15_1
.LBB15_1: # %bb2
movb>--1(%rsp), %al # 1-byte Reload
retq And in this example, it's not so much the match itself, but the overflow check that causes values that are live across BB boundaries. Compiling with
Each overflow check causes two spill/reload pairs. One for Also, a good bit of the stack usage is actually used by the In the general case the difference between debug and release mode, can probably be explained by the fact that in release mode, not only do we get a better register allocator, but we also use lifetime intrinsics in LLVM, which allow stack allocated values that are used in only one arm to share space with values only used in other arms. The latter would explain why the observed stack usage in the rust analyzer example goes from |
While we can move the |
This hit full-moon, where users are getting stack overflows for standard input. I am not performing one long match, but several in a row; if this is not the same bug, let me know. |
Because of rust-lang/rust#34283, in the get_decoder() function we ran out of stack space. Each CFA instance is ~19.000 bytes on the stack, and each decoder instance contains a camera member which contains a cfa member. This found by: cargo +nightly rustc --lib -- -Zprint-type-sizes 2>&1 | grep print-type > type-sizes.txt egrep "[[:digit:]]{5,9} bytes" type-sizes.txt
I ran into this problem while working on a parser (https://github.com/evanw/esbuild/tree/rust). Here's a reduced test case: https://gist.github.com/evanw/06e074a1d6d5c21e8d32e2c26de07714. It contains two recursive functions,
small
andlarge
, that each contain a match expression. Every call prints out the amount of stack space used.In debug:
In release:
I would expect the amount of stack space used by a match expression to be proportional to the stack space of the largest branch, not to the total stack space of all branches. The problem isn't too bad here but it causes my actual parser to use huge amounts of stack space and to crash with a stack overflow when parsing virtually all normal-sized inputs.
The text was updated successfully, but these errors were encountered: