Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds #1047

alamb · 2021-09-24T14:44:40Z

Which issue does this PR close?

Closes #419

Rationale for this change

Prior to this PR, trying to evaluate BinaryExprs more than a few level deep (e.g. something like a + a + a + a + a + a + a + a ...) would result in a stack overflow

For example the test included fails like this with tree_depth of 10 (in debug builds)

thread 'physical_plan::expressions::binary::tests::relatively_deeply_nested' has overflowed its stack
fatal runtime error: stack overflow
error: test failed, to rerun pass '-p datafusion --lib'

Caused by:
  process didn't exit successfully: `/Users/alamb/Software/arrow-datafusion/target/debug/deps/datafusion-68280be86fef135e deeply` (signal: 6, SIGABRT: process abort signal)

What changes are included in this PR?

Break the BinaryExpr::evaluate into a few smaller functions
Remove special case workaround added for Avro Table Provider #910

Are there any user-facing changes?

Not really other than avoiding stack overflows while evaluating queries in debug builds

Technical Backstory

I believe the issue is that in debug builds, each local variable gets its own (unique space in the stack). Due to the size of BinaryExpr::evaluate (largely hidden by macros) this results in a ludicrous amount of stack required for each call to BinaryExpr::evaluate. Since BinaryExpr::evaluate is implemented recursively this means even a few nesting levels exhausts a 2MB stack.

You can see evidence of looking at the disassembly. Here is how I did it on a mac:

otool -vt target/debug/deps/datafusion-68280be86fef135e > /tmp/df.asm

And the associated assembly shows the stack size to be 0x55a10 (350736 bytes)

__ZN118_$LT$datafusion..physical_plan..expressions..binary..BinaryExpr$u20$as$u20$datafusion..physical_plan..PhysicalExpr$GT$8evaluate17h2877dfaf102fa64eE:
0000000100d316d0	pushq	%rbp
0000000100d316d1	movq	%rsp, %rbp
0000000100d316d4	movl	$0x55a10, %eax                  ## the subq instruction below uses this value
0000000100d316d9	callq	0x1025dc540                     ## to effectively add 350736 (350K!) to the stack pointer
0000000100d316de	subq	%rax, %rsp
0000000100d316e1	movq	%rdx, -0x4a998(%rbp)
0000000100d316e8	movq	%rsi, -0x4a990(%rbp)
0000000100d316ef	movq	%rdi, %rax
...

In case you were curious, the same function in a release build only uses 0x398 (920 bytes) of stack space.

__ZN118_$LT$datafusion..physical_plan..expressions..binary..BinaryExpr$u20$as$u20$datafusion..physical_plan..PhysicalExpr$GT$8evaluate17h51f0595db0e2c70cE:
00000001009468a0        pushq   %rbp
00000001009468a1        movq    %rsp, %rbp
00000001009468a4        pushq   %r15
00000001009468a6        pushq   %r14
00000001009468a8        pushq   %r13
00000001009468aa        pushq   %r12
00000001009468ac        pushq   %rbx
00000001009468ad        subq    $0x398, %rsp                    ## imm = 0x398

For those of you visually minded, here is an illustration:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─
                   │
│     Function                      Base Pointer
     Parameters    │     ┌──────       (%ebp)
│                        │
┌──────────────────┤     │
│  Return Address  │     │
├──────────────────┤     │
│Saved Base Pointer│◀────┘
├──────────────────┘
    Local Var 1    │
├ ─ ─ ─ ─ ─ ─ ─ ─ ─
    Local Var 2    │             I theorize that in a debug build, a
├ ─ ─ ─ ─ ─ ─ ─ ─ ─           distinct space is reserved for each local
        ...        │                  variable of the function.
├ ─ ─ ─ ─ ─ ─ ─ ─ ─
    Local Var N    │            Thus a large number of local variables
├ ─ ─ ─ ─ ─ ─ ─ ─ ─            will result in a large stack frame (and
                   │              thus a large amount of stack space
│     Function                         consumed for each call)
     Parameters    │
│
┌──────────────────┤
│  Return Address  │◀────┐
└──────────────────┘     │
                         │
                         │
                         │         Stack Pointer
                         └──────       (%esp)

The "Fix" if you will is to break BinaryExpr into smaller functions that each require less stack space

The stack frame size after this PR is 0x770 = 1904 bytes

__ZN118_$LT$datafusion..physical_plan..expressions..binary..BinaryExpr$u20$as$u20$datafusion..physical_plan..PhysicalExpr$GT$8evaluate17h2877dfaf102fa64eE:
0000000100d31a50        pushq   %rbp
0000000100d31a51        movq    %rsp, %rbp
0000000100d31a54        subq    $0x770, %rsp                    ## imm = 0x770
0000000100d31a5b        movq    %rdx, -0x6a8(%rbp)
0000000100d31a62        movq    %rsi, -0x6a0(%rbp)

Note that even though some of the new functions require non trivial stack size (listed below), a major difference is they are not called recursively and thus there is only ever one frame of them on the stack:

evaluate_array_scalar:       0x1DE80 (122496 bytes)
evaluate_scalar_array:       0x162A0 (90784 bytes)
evaluate_with_resolved_args: 0x1F970 (129392 bytes)

alamb · 2021-09-24T14:45:08Z

.github/workflows/rust.yml

@@ -105,8 +105,6 @@ jobs:
        run: |
          export ARROW_TEST_DATA=$(pwd)/testing/data
          export PARQUET_TEST_DATA=$(pwd)/parquet-testing/data
-          # run tests on all workspace members with default feature list + avro


This is the workaround added in #910

alamb · 2021-09-24T14:45:43Z

datafusion/src/physical_plan/expressions/binary.rs

@@ -543,86 +543,17 @@ impl PhysicalExpr for BinaryExpr {
            )));
        }

+        // Attempt to use special kernels if one input is scalar and the other is an array


There are no intended changes to this function's behavior, simply breaking it up into several smaller functions

alamb · 2021-09-24T14:46:56Z

datafusion/src/physical_plan/expressions/binary.rs

+        let schema = batch.schema();
+
+        // build a left deep tree ((((a + a) + a) + a ....
+        let tree_depth: i32 = 100;


On master, this test causes a stack overflow with tree_depth of 10. After the changes in this PR it passes successfully with a tree_depth of 100 (I didn't try any bigger)

NGA-TRAN

Thanks for the deep-dive evaluation and the fix, @alamb

NGA-TRAN · 2021-09-24T15:16:54Z

datafusion/src/physical_plan/expressions/binary.rs

+        let schema = batch.schema();
+
+        // build a left deep tree ((((a + a) + a) + a ....
+        let tree_depth: i32 = 100;


alamb · 2021-09-24T18:09:17Z

I'll plan to merge this tomorrow unless there are objections

houqp

Great job on the deep dive and detailed diagram :)

houqp · 2021-09-25T22:31:29Z

This kind of "match to many arms with macros" pattern is very common in our code base. It seems like in the future if we want to get fancy, we could automate detection of such problem in our code base. For example, having a linter tool automatically find out all recursive functions and check their stack size to see if it has passed a certain threshold.

alamb · 2021-09-26T10:43:14Z

This kind of "match to many arms with macros" pattern is very common in our code base. It seems like in the future if we want to get fancy, we could automate detection of such problem in our code base. For example, having a linter tool automatically find out all recursive functions and check their stack size to see if it has passed a certain threshold.

I agree -- if we run into this problem again, investing in an automated tool like that sounds like a good idea.

alamb added 5 commits September 24, 2021 09:12

Test for stack overflow

f07e811

Remove STACK SIZE workaround

d2c8bb0

Move out a bit more into a different function

453425d

move more

8837c8b

Increase tree depth to 100 to test

2ae2ab8

github-actions bot added the datafusion Changes in the datafusion crate label Sep 24, 2021

Dandandan approved these changes Sep 24, 2021

View reviewed changes

alamb commented Sep 24, 2021

View reviewed changes

NGA-TRAN approved these changes Sep 24, 2021

View reviewed changes

houqp approved these changes Sep 25, 2021

View reviewed changes

houqp added the performance Make DataFusion faster label Sep 25, 2021

houqp merged commit 26399ed into apache:master Sep 25, 2021

alamb deleted the alamb/reduce_stack_usage branch September 26, 2021 10:42

alamb mentioned this pull request Dec 13, 2021

Query with 100 OR conditions overflows stack #1434

Closed

alamb mentioned this pull request Jul 16, 2022

Boxed Query body to save some stack space apache/datafusion-sqlparser-rs#540

Merged

alamb mentioned this pull request Aug 11, 2022

Remove unnecessary Box apache/datafusion-sqlparser-rs#556

Closed

This was referenced Dec 29, 2022

Stack overflow planning complex query #4065

Closed

Fix Stack overflow in sql planning in debug builds #4779

Merged

Jefffrey mentioned this pull request Apr 5, 2024

Fix tpcds planning stack overflows - Join planning refactoring #9962

Merged

alamb mentioned this pull request Sep 9, 2024

Refactor SqlToRel::sql_expr_to_logical_expr_internal to reduce stack size #12384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds #1047

Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds #1047

alamb commented Sep 24, 2021 •

edited

Loading

alamb Sep 24, 2021

alamb Sep 24, 2021

alamb Sep 24, 2021

NGA-TRAN Sep 24, 2021

NGA-TRAN left a comment

NGA-TRAN Sep 24, 2021

alamb commented Sep 24, 2021

houqp left a comment

houqp commented Sep 25, 2021 •

edited

Loading

alamb commented Sep 26, 2021

Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047

Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047

Conversation

alamb commented Sep 24, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Technical Backstory

alamb Sep 24, 2021

Choose a reason for hiding this comment

alamb Sep 24, 2021

Choose a reason for hiding this comment

alamb Sep 24, 2021

Choose a reason for hiding this comment

NGA-TRAN Sep 24, 2021

Choose a reason for hiding this comment

NGA-TRAN left a comment

Choose a reason for hiding this comment

NGA-TRAN Sep 24, 2021

Choose a reason for hiding this comment

alamb commented Sep 24, 2021

houqp left a comment

Choose a reason for hiding this comment

houqp commented Sep 25, 2021 • edited Loading

alamb commented Sep 26, 2021

Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds #1047

Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds #1047

alamb commented Sep 24, 2021 •

edited

Loading

houqp commented Sep 25, 2021 •

edited

Loading