Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match expressions use O(n) stack space with n branches in debug mode #34283

Open
evanw opened this issue Jun 15, 2016 · 12 comments
Open

Match expressions use O(n) stack space with n branches in debug mode #34283

evanw opened this issue Jun 15, 2016 · 12 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@evanw
Copy link

evanw commented Jun 15, 2016

I ran into this problem while working on a parser (https://github.com/evanw/esbuild/tree/rust). Here's a reduced test case: https://gist.github.com/evanw/06e074a1d6d5c21e8d32e2c26de07714. It contains two recursive functions, small and large, that each contain a match expression. Every call prints out the amount of stack space used.

In debug:

small:
stack space: 0.3kb
stack space: 0.7kb
stack space: 1.0kb
stack space: 1.4kb
stack space: 1.7kb
stack space: 2.1kb
stack space: 2.4kb
stack space: 2.8kb
stack space: 3.1kb
stack space: 3.4kb
stack space: 3.8kb
large:
stack space: 0.6kb
stack space: 1.3kb
stack space: 1.9kb
stack space: 2.6kb
stack space: 3.2kb
stack space: 3.8kb
stack space: 4.5kb
stack space: 5.1kb
stack space: 5.8kb
stack space: 6.4kb
stack space: 7.0kb

In release:

small:
stack space: 0.0kb
stack space: 0.1kb
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.5kb
stack space: 0.6kb
stack space: 0.7kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.0kb
stack space: 1.1kb
large:
stack space: 0.0kb
stack space: 0.1kb
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.5kb
stack space: 0.6kb
stack space: 0.7kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.0kb
stack space: 1.1kb

I would expect the amount of stack space used by a match expression to be proportional to the stack space of the largest branch, not to the total stack space of all branches. The problem isn't too bad here but it causes my actual parser to use huge amounts of stack space and to crash with a stack overflow when parsing virtually all normal-sized inputs.

@Aatch
Copy link
Contributor

Aatch commented Jun 15, 2016

I have investigated this somewhat and have found that the stack usage is from register spilling. For some reason LLVM is spilling the registers to a separate location in each branch around the overflow check. I have no idea why it might be doing this though.

Note that without the println! the IR we produce only creates stack slots for the three arguments.

@Aatch Aatch added the A-codegen Area: Code generation label Jun 15, 2016
@Aatch
Copy link
Contributor

Aatch commented Jun 15, 2016

/cc @rust-lang/compiler

@evanw
Copy link
Author

evanw commented Jun 15, 2016

Oh interesting, yeah maybe it only happens when there are early exits inside a match expression. Removing my heavy use of try! inside the match expressions in my actual parser makes my stack size problem completely disappear, but then of course I can't recover from parse errors.

@nikomatsakis
Copy link
Contributor

I would expect that this may be due to the specifics of the lifetimes that we emit for the bindings.

@nikomatsakis
Copy link
Contributor

Oh, never mind, just saw @Aatch's comment.

@dotdash
Copy link
Contributor

dotdash commented Jun 15, 2016

@Aatch seems like the fast register allocator spills all live registers at the end of each basic block.

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017
@nox
Copy link
Contributor

nox commented Apr 2, 2018

I suspect this may have far reaching implications making code worse all over the place. For example Stylo is full of very large match expressions and I wonder if that limitation is making it unnecessarily bloated. I may be completely wrong though, given @Aatch's comment about println!.

Cc @rust-lang/wg-codegen

@matklad
Copy link
Member

matklad commented Jul 11, 2020

Triage (1.44.1)

We definitely hit this in debug in rust-analyzer. Our code for expression lowering is a single giant recursive match, and it uses 20k of stack space per recursion level in debug if all branches are there. If I comment out all the branches which are not used in my specific test, stack space usage goes down dramatically.

In release, I think I am seeing stack space proportional to the max branch, as commenting branches doesn't make that huge a difference with --release. I think this is true about the original report as well? With --release, both small and large use the same amount of stack space.

See rust-lang/rust-analyzer@12d52a7 for a real-world example of this.

@jonas-schievink jonas-schievink added I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-patterns Relating to patterns and pattern matching labels Jul 11, 2020
@jonas-schievink jonas-schievink added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 11, 2020
@oli-obk oli-obk added the A-mir-opt Area: MIR optimizations label Jul 12, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Jul 12, 2020

Maybe @rust-lang/wg-mir-opt can do something to clean up the match logic to make it easier for llvm to figure out the register spilling correctly.

@dotdash
Copy link
Contributor

dotdash commented Jul 12, 2020

The problem here is having live values across BB boundaries, because the register allocator in debug mode simply spills and reloads everything, even for unconditional branches.

Silly example:

define internal i8 @testcase(i8 %0) {
  br label %bb2

bb2:
  ret i8 %0
}

becomes:

testcase:                               # @testcase
  .cfi_startproc
# %bb.0:
                                          # kill: def $dil killed $dil killed $edi
  movb>-%dil, -1(%rsp)          # 1-byte Spill
  jmp>.LBB15_1
.LBB15_1:                               # %bb2
  movb>--1(%rsp), %al           # 1-byte Reload
  retq

And in this example, it's not so much the match itself, but the overflow check that causes values that are live across BB boundaries. Compiling with -Cdebug-assertions=no gives the same stack usage for small and large.

small:
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.6kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.1kb
stack space: 1.3kb
stack space: 1.5kb
stack space: 1.7kb
stack space: 1.9kb
stack space: 2.1kb
large:
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.6kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.1kb
stack space: 1.3kb
stack space: 1.5kb
stack space: 1.7kb
stack space: 1.9kb
stack space: 2.1kb

Each overflow check causes two spill/reload pairs. One for token (1 byte) and one for the result of the subtraction. Which, for alignment reasons, adds up to 8 bytes of stack usage each. I'm not sure there's much we can do there in terms of MIR construction, but I'd love to be proven wrong here :-)

Also, a good bit of the stack usage is actually used by the println! call. Without the output, small uses 136 bytes of stack with debug assertions, and large uses 440 bytes. Without debug assertions, both use 40 bytes of stack.

In the general case the difference between debug and release mode, can probably be explained by the fact that in release mode, not only do we get a better register allocator, but we also use lifetime intrinsics in LLVM, which allow stack allocated values that are used in only one arm to share space with values only used in other arms. The latter would explain why the observed stack usage in the rust analyzer example goes from Sum(arms) to Max(arms). Short of doing some form of stack coloring of our own, I don't see a way to improve this in terms of MIR generation either.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 12, 2020

While we can move the count - 1 computation out of the match arms or create a mir optimization that can figure out that the overflow check is unnecessary, because we already checked for count > 0, this won't help in general. E.g. a developer replacing all the count - 1 with count.wrapping_sub(1) just places a function call where there was an overflow check, giving us the same additional basic block. So assuming we want to handle, yea I agree that we can't do anything with a mir-optimization beyond a bunch of not too-high-hanging fruit.

@Kampfkarren
Copy link
Contributor

This hit full-moon, where users are getting stack overflows for standard input. I am not performing one long match, but several in a row; if this is not the same bug, let me know.

https://users.rust-lang.org/t/stack-overflow-on-debug-mode-but-not-release-cant-figure-out-solution-without-increasing-stack-allocation-for-release/59869

cytrinox added a commit to cytrinox/dnglab that referenced this issue Jun 5, 2022
Because of rust-lang/rust#34283, in the
get_decoder() function we ran out of stack space.
Each CFA instance is ~19.000 bytes on the stack, and each decoder
instance contains a camera member which contains a cfa member.

This found by:

cargo +nightly rustc --lib -- -Zprint-type-sizes 2>&1 | grep print-type > type-sizes.txt
egrep "[[:digit:]]{5,9} bytes" type-sizes.txt
@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@Nadrieril Nadrieril changed the title Match expressions use O(n) stack space with n branches Match expressions use O(n) stack space with n branches in debug mode Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests