Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime stack overflow that does not make any sense #58164

Closed
shisoft opened this issue Feb 4, 2019 · 7 comments
Closed

Runtime stack overflow that does not make any sense #58164

shisoft opened this issue Feb 4, 2019 · 7 comments

Comments

@shisoft
Copy link

shisoft commented Feb 4, 2019

I am developing a B+ plus tree which its page size is controlled by generic typing arrays.

Before I put it on large scale, I did tried to write tests and it works. When I tried to set the page size as large as 240000. I got stack overflow error. I have nailed the problem to one function and write a test for it but the behaviour looks strange to me. The test code, which is:

const LARGE_PAGE_SIZE: usize = 240000;
struct LargeKeySlice {
    inner: [EntryKey; LARGE_PAGE_SIZE]
}
type LargePtrSlice = [NodeCellRef; LARGE_PAGE_SIZE + 1];
type LargeLevelBPlusTree = BPlusTree<LargeKeySlice, LargePtrSlice>;

#[test]
#[should_panic]
fn large_page() {
    // this test should panic but not stack overflow
    env_logger::init();
    debug!("testing");
    ExtNode::<LargeKeySlice, LargePtrSlice>::new(Id::rand(), smallvec!(0));
}

It have a debug macro which will print testing before the new function to be invoked. The test case actually overflows stack without printing testing. If we replace the new function with panic!(), it will pass the test with testing been printed. The most interesting part is, if we clear the new function and make an explicit panic, it still overflow stack.

If we decrease LARGE_PAGE_SIZE to 24000, the test case will panic without overflow stack, which is exactly what I expected. This problem does not seems like my problem for the new function have not been invoked during the stack overflow.

Because I have no clue about how this happened, I was unable to replicate this issue on playground. As far as my attempt, this works without stack overflow

Feel free to checkout and play around by
cargo test --package neb --lib index::btree::test::large_page -- --nocapture

Error

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
error: process didn't exit successfully: `/Users/shisoft/Documents/OSS Projects/Nebuchadnezzar/target/debug/deps/neb-96fd92871c91115d 'index::lsmtree::test::insertions' --nocapture --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Rust Version

rustc 1.34.0-nightly (f6fac4225 2019-02-03)

@hellow554
Copy link
Contributor

hellow554 commented Feb 5, 2019

I can't copy&paste your code. Can you please try to reformat it and take care of the line endings? :| Looks like a github issue, sorry!

Also it looks not like it would compile. Can you try to provide a MCVE?

@shisoft
Copy link
Author

shisoft commented Feb 5, 2019

@hellow554 I will try. But as I said, I was unable to replicate this issue from MCVE. Is there any way to diagnose this problem? Can I print the overflowed stack?

@hellow554
Copy link
Contributor

You could try to use gdb to debug the problem. Maybe ulimit -a unlimited may help to solve the issue?

@shisoft
Copy link
Author

shisoft commented Feb 5, 2019

I have run this test program individually and it have a core dump. But gdb does not show anything useful. Anything I can try?

Type "apropos word" to search for commands related to "word".
[New LWP 19884]
[New LWP 19885]
[New LWP 19886]
[New LWP 19887]
[New LWP 19888]
[New LWP 19889]
[New LWP 19890]
[New LWP 19891]
[New LWP 19892]
[New LWP 19893]
[New LWP 19894]
[New LWP 19895]
[New LWP 19896]
[New LWP 19897]
[New LWP 19898]
[New LWP 19899]
[New LWP 19900]
[New LWP 19901]
[New LWP 19902]
[New LWP 19903]
[New LWP 19904]
[New LWP 19905]
[New LWP 19906]
[New LWP 19907]
[New LWP 19908]
[New LWP 19909]
[New LWP 19910]
[New LWP 19911]
[New LWP 19912]
[New LWP 19913]
[New LWP 19914]
[New LWP 19915]
[New LWP 19916]
[New LWP 19917]
[New LWP 19918]
[New LWP 19919]
[New LWP 19921]
[New LWP 19922]
[New LWP 19920]
[New LWP 19923]
[New LWP 19932]
[New LWP 19931]
[New LWP 19930]
[New LWP 19928]
[New LWP 19927]
[New LWP 19929]
[New LWP 19926]
[New LWP 19924]
[New LWP 19925]
Core was generated by `/home/shisoft/Nebuchadnezzar/target/debug/deps/neb-8043d1c65fede661'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb76fbc31 in __kernel_vsyscall ()
[Current thread is 1 (LWP 19884)]
(gdb) bt
#0  0xb76fbc31 in __kernel_vsyscall ()
#1  0xb751eea9 in ?? ()
#2  0xb76a5000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

@shisoft
Copy link
Author

shisoft commented Feb 5, 2019

@hellow554 I just realized I have right tools for debugging. But the debug produced another segment fault.

* thread #2, name = 'index::btree::test::large_page', stop reason = EXC_BAD_ACCESS (code=2, address=0x700005aec840)
  * frame #0: 0x000000010ed21f93 neb-96fd92871c91115d`__rust_probestack at probestack.rs:55 [opt]
    frame #1: 0x000000010e404f01 neb-96fd92871c91115d`neb::index::btree::test::large_page::_$u7b$$u7b$closure$u7d$$u7d$::h889a369a7e721f35 + 17
    frame #2: 0x000000010e2466a1 neb-96fd92871c91115d`core::ops::function::FnOnce::call_once::h97c0f1df1e597879 + 17
    frame #3: 0x000000010e483602 neb-96fd92871c91115d`call_box<(),closure> [inlined] {{closure}} at lib.rs:1474 [opt]
    frame #4: 0x000000010e4835fd neb-96fd92871c91115d`call_box<(),closure> [inlined] call_once<closure,()> at function.rs:231 [opt]
    frame #5: 0x000000010e4835fd neb-96fd92871c91115d`call_box<(),closure> at boxed.rs:734 [opt]
    frame #6: 0x000000010ed0783f neb-96fd92871c91115d`__rust_maybe_catch_panic at lib.rs:92 [opt]
    frame #7: 0x000000010e49f417 neb-96fd92871c91115d`{{closure}} [inlined] try<(),std::panic::AssertUnwindSafe<alloc::boxed::Box<FnBox<()>>>> at panicking.rs:276 [opt]
    frame #8: 0x000000010e49f3d2 neb-96fd92871c91115d`{{closure}} [inlined] catch_unwind<std::panic::AssertUnwindSafe<alloc::boxed::Box<FnBox<()>>>,()> at panic.rs:388 [opt]
    frame #9: 0x000000010e49f3d2 neb-96fd92871c91115d`{{closure}} at lib.rs:1429 [opt]
    frame #10: 0x000000010e47bfc5 neb-96fd92871c91115d`__rust_begin_short_backtrace<closure,()> at backtrace.rs:135 [opt]
    frame #11: 0x000000010e47c5e5 neb-96fd92871c91115d`do_call<std::panic::AssertUnwindSafe<closure>,()> [inlined] {{closure}}<closure,()> at mod.rs:469 [opt]
    frame #12: 0x000000010e47c5d2 neb-96fd92871c91115d`do_call<std::panic::AssertUnwindSafe<closure>,()> [inlined] call_once<(),closure> at panic.rs:309 [opt]
    frame #13: 0x000000010e47c5d2 neb-96fd92871c91115d`do_call<std::panic::AssertUnwindSafe<closure>,()> at panicking.rs:297 [opt]
    frame #14: 0x000000010ed0783f neb-96fd92871c91115d`__rust_maybe_catch_panic at lib.rs:92 [opt]
    frame #15: 0x000000010e4837f5 neb-96fd92871c91115d`call_box<(),closure> [inlined] try<(),std::panic::AssertUnwindSafe<closure>> at panicking.rs:276 [opt]
    frame #16: 0x000000010e4837bc neb-96fd92871c91115d`call_box<(),closure> [inlined] catch_unwind<std::panic::AssertUnwindSafe<closure>,()> at panic.rs:388 [opt]
    frame #17: 0x000000010e4837bc neb-96fd92871c91115d`call_box<(),closure> [inlined] {{closure}}<closure,()> at mod.rs:468 [opt]
    frame #18: 0x000000010e48377e neb-96fd92871c91115d`call_box<(),closure> at boxed.rs:734 [opt]
    frame #19: 0x000000010ed06e1c neb-96fd92871c91115d`thread_start [inlined] call_once<(),()> at boxed.rs:744 [opt]
    frame #20: 0x000000010ed06e19 neb-96fd92871c91115d`thread_start [inlined] start_thread at thread.rs:14 [opt]
    frame #21: 0x000000010ed06d9e neb-96fd92871c91115d`thread_start at thread.rs:81 [opt]
    frame #22: 0x00007fff5e628305 libsystem_pthread.dylib`_pthread_body + 126
    frame #23: 0x00007fff5e62b26f libsystem_pthread.dylib`_pthread_start + 70
    frame #24: 0x00007fff5e627415 libsystem_pthread.dylib`thread_start + 13

Maybe this is related? #43102
Debug would failed for these test cases have large slice whatever if will pass the test cases or not

@lnicola
Copy link
Member

lnicola commented Feb 5, 2019

If you're declaring or returning large local variables, they might overflow the stack. The default stack size is 8 MB or so on Linux and 2 MB (IIRC) on Windows. Your array might be larger than that.

Note that I haven't looked at your original code.

@shisoft
Copy link
Author

shisoft commented Feb 5, 2019

@lnicola Yes, there are functions returning large variables. This can explain the problem. Guess I should allocate pages on heaps instead. Thanks.

@shisoft shisoft closed this as completed Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants