Match expressions use O(n) stack space with n branches in debug mode #34283

evanw · 2016-06-15T06:25:51Z

I ran into this problem while working on a parser (https://github.com/evanw/esbuild/tree/rust). Here's a reduced test case: https://gist.github.com/evanw/06e074a1d6d5c21e8d32e2c26de07714. It contains two recursive functions, small and large, that each contain a match expression. Every call prints out the amount of stack space used.

In debug:

small:
stack space: 0.3kb
stack space: 0.7kb
stack space: 1.0kb
stack space: 1.4kb
stack space: 1.7kb
stack space: 2.1kb
stack space: 2.4kb
stack space: 2.8kb
stack space: 3.1kb
stack space: 3.4kb
stack space: 3.8kb
large:
stack space: 0.6kb
stack space: 1.3kb
stack space: 1.9kb
stack space: 2.6kb
stack space: 3.2kb
stack space: 3.8kb
stack space: 4.5kb
stack space: 5.1kb
stack space: 5.8kb
stack space: 6.4kb
stack space: 7.0kb

In release:

small:
stack space: 0.0kb
stack space: 0.1kb
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.5kb
stack space: 0.6kb
stack space: 0.7kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.0kb
stack space: 1.1kb
large:
stack space: 0.0kb
stack space: 0.1kb
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.5kb
stack space: 0.6kb
stack space: 0.7kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.0kb
stack space: 1.1kb

I would expect the amount of stack space used by a match expression to be proportional to the stack space of the largest branch, not to the total stack space of all branches. The problem isn't too bad here but it causes my actual parser to use huge amounts of stack space and to crash with a stack overflow when parsing virtually all normal-sized inputs.

The text was updated successfully, but these errors were encountered:

Aatch · 2016-06-15T06:33:22Z

I have investigated this somewhat and have found that the stack usage is from register spilling. For some reason LLVM is spilling the registers to a separate location in each branch around the overflow check. I have no idea why it might be doing this though.

Note that without the println! the IR we produce only creates stack slots for the three arguments.

Aatch · 2016-06-15T06:33:45Z

/cc @rust-lang/compiler

evanw · 2016-06-15T07:09:14Z

Oh interesting, yeah maybe it only happens when there are early exits inside a match expression. Removing my heavy use of try! inside the match expressions in my actual parser makes my stack size problem completely disappear, but then of course I can't recover from parse errors.

nikomatsakis · 2016-06-15T07:51:40Z

I would expect that this may be due to the specifics of the lifetimes that we emit for the bindings.

nikomatsakis · 2016-06-15T07:51:57Z

Oh, never mind, just saw @Aatch's comment.

dotdash · 2016-06-15T12:53:05Z

@Aatch seems like the fast register allocator spills all live registers at the end of each basic block.

nox · 2018-04-02T15:15:16Z

I suspect this may have far reaching implications making code worse all over the place. For example Stylo is full of very large match expressions and I wonder if that limitation is making it unnecessarily bloated. I may be completely wrong though, given @Aatch's comment about println!.

Cc @rust-lang/wg-codegen

matklad · 2020-07-11T16:02:53Z

Triage (1.44.1)

We definitely hit this in debug in rust-analyzer. Our code for expression lowering is a single giant recursive match, and it uses 20k of stack space per recursion level in debug if all branches are there. If I comment out all the branches which are not used in my specific test, stack space usage goes down dramatically.

In release, I think I am seeing stack space proportional to the max branch, as commenting branches doesn't make that huge a difference with --release. I think this is true about the original report as well? With --release, both small and large use the same amount of stack space.

See rust-lang/rust-analyzer@12d52a7 for a real-world example of this.

oli-obk · 2020-07-12T11:26:40Z

Maybe @rust-lang/wg-mir-opt can do something to clean up the match logic to make it easier for llvm to figure out the register spilling correctly.

dotdash · 2020-07-12T12:59:53Z

The problem here is having live values across BB boundaries, because the register allocator in debug mode simply spills and reloads everything, even for unconditional branches.

Silly example:

define internal i8 @testcase(i8 %0) {
  br label %bb2

bb2:
  ret i8 %0
}

becomes:

testcase:                               # @testcase
  .cfi_startproc
# %bb.0:
                                          # kill: def $dil killed $dil killed $edi
  movb>-%dil, -1(%rsp)          # 1-byte Spill
  jmp>.LBB15_1
.LBB15_1:                               # %bb2
  movb>--1(%rsp), %al           # 1-byte Reload
  retq

And in this example, it's not so much the match itself, but the overflow check that causes values that are live across BB boundaries. Compiling with -Cdebug-assertions=no gives the same stack usage for small and large.

small:
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.6kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.1kb
stack space: 1.3kb
stack space: 1.5kb
stack space: 1.7kb
stack space: 1.9kb
stack space: 2.1kb
large:
stack space: 0.2kb
stack space: 0.4kb
stack space: 0.6kb
stack space: 0.8kb
stack space: 0.9kb
stack space: 1.1kb
stack space: 1.3kb
stack space: 1.5kb
stack space: 1.7kb
stack space: 1.9kb
stack space: 2.1kb

Each overflow check causes two spill/reload pairs. One for token (1 byte) and one for the result of the subtraction. Which, for alignment reasons, adds up to 8 bytes of stack usage each. I'm not sure there's much we can do there in terms of MIR construction, but I'd love to be proven wrong here :-)

Also, a good bit of the stack usage is actually used by the println! call. Without the output, small uses 136 bytes of stack with debug assertions, and large uses 440 bytes. Without debug assertions, both use 40 bytes of stack.

In the general case the difference between debug and release mode, can probably be explained by the fact that in release mode, not only do we get a better register allocator, but we also use lifetime intrinsics in LLVM, which allow stack allocated values that are used in only one arm to share space with values only used in other arms. The latter would explain why the observed stack usage in the rust analyzer example goes from Sum(arms) to Max(arms). Short of doing some form of stack coloring of our own, I don't see a way to improve this in terms of MIR generation either.

oli-obk · 2020-07-12T13:29:04Z

While we can move the count - 1 computation out of the match arms or create a mir optimization that can figure out that the overflow check is unnecessary, because we already checked for count > 0, this won't help in general. E.g. a developer replacing all the count - 1 with count.wrapping_sub(1) just places a function call where there was an overflow check, giving us the same additional basic block. So assuming we want to handle, yea I agree that we can't do anything with a mir-optimization beyond a bunch of not too-high-hanging fruit.

Kampfkarren · 2021-05-17T09:36:10Z

This hit full-moon, where users are getting stack overflows for standard input. I am not performing one long match, but several in a row; if this is not the same bug, let me know.

https://users.rust-lang.org/t/stack-overflow-on-debug-mode-but-not-release-cant-figure-out-solution-without-increasing-stack-allocation-for-release/59869

Because of rust-lang/rust#34283, in the get_decoder() function we ran out of stack space. Each CFA instance is ~19.000 bytes on the stack, and each decoder instance contains a camera member which contains a cfa member. This found by: cargo +nightly rustc --lib -- -Zprint-type-sizes 2>&1 | grep print-type > type-sizes.txt egrep "[[:digit:]]{5,9} bytes" type-sizes.txt

Aatch added the A-codegen Area: Code generation label Jun 15, 2016

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017

jonas-schievink added I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-patterns Relating to patterns and pattern matching labels Jul 11, 2020

matklad mentioned this issue Jul 11, 2020

Expression lowering uses unreasonable amount of stack space rust-lang/rust-analyzer#5317

Open

jonas-schievink added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 11, 2020

oli-obk added the A-mir-opt Area: MIR optimizations label Jul 12, 2020

ImmemorConsultrixContrarie mentioned this issue Oct 5, 2020

enum refactoring google/flatbuffers#6158

Closed

jonas-schievink mentioned this issue May 28, 2021

stack overflow in match pattern with '?' #85787

Closed

matklad mentioned this issue Aug 8, 2021

minor: Update npm deps rust-lang/rust-analyzer#9179

Merged

Shadow53 mentioned this issue Apr 21, 2022

refactor!: optimize operations with multithreaded tokio Shadow53/hoard#127

Merged

cytrinox mentioned this issue Jun 5, 2022

Reduce stack usage for CFA to prevent stack overflows dnglab/dnglab#185

Merged

Bromeon mentioned this issue Jun 6, 2022

Stack space used by assertions ever-growing with opt-level=0, causing stack overflow #97790

Closed

andreaphylum mentioned this issue Jun 13, 2022

Added build script as workaround for Window debug builds phylum-dev/cli#462

Merged

evenyag mentioned this issue Dec 9, 2022

Tracking Issue: migrate Arrow/Parquet to the official implementation. GreptimeTeam/greptimedb#555

Closed

52 tasks

waynexia mentioned this issue Dec 9, 2022

fix: pre-cast to avoid tremendous match arms GreptimeTeam/greptimedb#734

Merged

2 tasks

This was referenced Apr 24, 2023

Stack overflow issue in scalecodec while decoding inputs paritytech/parity-scale-codec#425

Closed

Rework decoding of Boxes, Rcs, Arcs, arrays and enums (stack overflow fix) paritytech/parity-scale-codec#426

Merged

yrong mentioned this issue Jun 6, 2023

Increase test coverage & Fix for goerli setup Snowfork/snowbridge#853

Merged

3 tasks

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

Nadrieril changed the title ~~Match expressions use O(n) stack space with n branches~~ Match expressions use O(n) stack space with n branches in debug mode Dec 1, 2023

waynexia mentioned this issue Jul 8, 2024

refactor: split match arms in prom_expr_to_plan into smaller methods GreptimeTeam/greptimedb#4317

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match expressions use O(n) stack space with n branches in debug mode #34283

Match expressions use O(n) stack space with n branches in debug mode #34283

evanw commented Jun 15, 2016 •

edited

Loading

Aatch commented Jun 15, 2016

Aatch commented Jun 15, 2016

evanw commented Jun 15, 2016 •

edited

Loading

nikomatsakis commented Jun 15, 2016

nikomatsakis commented Jun 15, 2016

dotdash commented Jun 15, 2016

nox commented Apr 2, 2018

matklad commented Jul 11, 2020

oli-obk commented Jul 12, 2020

dotdash commented Jul 12, 2020

oli-obk commented Jul 12, 2020

Kampfkarren commented May 17, 2021

Match expressions use O(n) stack space with n branches in debug mode #34283

Match expressions use O(n) stack space with n branches in debug mode #34283

Comments

evanw commented Jun 15, 2016 • edited Loading

Aatch commented Jun 15, 2016

Aatch commented Jun 15, 2016

evanw commented Jun 15, 2016 • edited Loading

nikomatsakis commented Jun 15, 2016

nikomatsakis commented Jun 15, 2016

dotdash commented Jun 15, 2016

nox commented Apr 2, 2018

matklad commented Jul 11, 2020

oli-obk commented Jul 12, 2020

dotdash commented Jul 12, 2020

oli-obk commented Jul 12, 2020

Kampfkarren commented May 17, 2021

evanw commented Jun 15, 2016 •

edited

Loading

evanw commented Jun 15, 2016 •

edited

Loading