Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example_bitcoind_rpc_polling sync command ends with stack overflow #1634

Closed
ValuedMammal opened this issue Oct 1, 2024 · 12 comments · Fixed by #1731
Closed

example_bitcoind_rpc_polling sync command ends with stack overflow #1634

ValuedMammal opened this issue Oct 1, 2024 · 12 comments · Fixed by #1731
Assignees
Labels
bug Something isn't working module-blockchain
Milestone

Comments

@ValuedMammal
Copy link
Contributor

Describe the bug
I ran a sync from genesis using a local signet node. It took around 25 minutes to reach the new tip but ended abruptly before finishing.

[  1442.151s] committed to db (took 0.11852204s)
[  1443.706s] synced to 0000003dee9b4362728c83d684d8490ddf4ffeb6ad7fe5ccda3f2203bffc24f2 @ 210819 | total: 0.00020300 BTC
[ 1449.7129s] synced to 000000a55e6a22bfd21c68f46774abda1e403ea95983f7d26981ccb24984ac60 @ 211577 | total: 0.00020300 BTC
[ 1455.7239s] synced to 00000071de5c82e264071bd7f0a8a91eab1e1b5461d44fce3b0fa99092e3f314 @ 212323 | total: 0.00020300 BTC
[ 1461.7284s] synced to 000000eeb493d04aeb69f4363bd01eb30540ff125a978c25ba082ad5f405071f @ 212890 | total: 0.00020300 BTC
[ 1467.7336s] synced to 0000002fdc5b95799d03f8c150fe7ead118bb9348f532edbdd5eda1bd02c1de7 @ 213657 | total: 0.00020300 BTC
[  1473.861s] synced to 000000b71b4299131fec5052912e975a10058a0c5c7532af945da9e081c76275 @ 214637 | total: 0.00019500 BTC
[   1479.87s] synced to 0000009c12399ad6344fe09264e4292f33b8db09fd26b79ad0f5bae102e9b880 @ 215572 | total: 0.00069500 BTC

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      cargo run --bin example_bitcoind_rpc_polling -- sync

To Reproduce

DESCRIPTOR=tr([7d94197e/86'/1'/0']tpubDCyQVJj8KzjiQsFjmb3KwECVXPvMwvAxxZGCP9XmWSopmjW3bCV3wD7TgxrUhiGSueDS1MU5X1Vb1YjYcp8jitXc5fXfdC1z68hDDEyKRNr/0/*)
CHANGE_DESCRIPTOR=tr([7d94197e/86'/1'/0']tpubDCyQVJj8KzjiQsFjmb3KwECVXPvMwvAxxZGCP9XmWSopmjW3bCV3wD7TgxrUhiGSueDS1MU5X1Vb1YjYcp8jitXc5fXfdC1z68hDDEyKRNr/1/*)
FALLBACK_HEIGHT=1

cargo run --bin example_bitcoind_rpc_polling -- sync

Expected behavior

Build environment

  • BDK tag/commit: v1.0.0-beta.4
  • OS+version: macOS Sonoma 14.5
  • Rust/Cargo version: 1.81.0
  • Rust/Cargo target: aarch64-apple-darwin

Additional context

@ValuedMammal ValuedMammal added the bug Something isn't working label Oct 1, 2024
@ValuedMammal
Copy link
Contributor Author

New information: the database file is ~7.9M. Interestingly I also get a stack overflow when calling the balance command, indicating that there could be a problem loading the file contents into memory (as a side point this balance also appears wrong to me).

     Running `target/debug/example_bitcoind_rpc_polling balance`
confirmed:
    total             69500 sats
    spendable         69500 sats
    immature              0 sats
unconfirmed:
    total                 0 sats
    trusted               0 sats
    untrusted             0 sats

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      cargo run --bin example_bitcoind_rpc_polling -- balance

I commented out the line here in example_cli where we apply the chain changeset

let chain = Mutex::new({
let (mut chain, _) =
LocalChain::from_genesis_hash(constants::genesis_block(network).block_hash());
chain.apply_changeset(&changeset.local_chain)?;
chain
});
and while no balance is returned (as expected), it does prevent hitting a stack overflow

     Running `target/debug/example_bitcoind_rpc_polling balance`
confirmed:
    total                 0 sats
    spendable             0 sats
    immature              0 sats
unconfirmed:
    total                 0 sats
    trusted               0 sats
    untrusted             0 sats

@luisschwab
Copy link
Contributor

Interesting, I also had this problem a while ago. Will try to reproduce it again and report back.

@ValuedMammal
Copy link
Contributor Author

Seems to be isolated to the CheckPoint type, but the question remains whether something should be done in bdk_bitcoind_rpc to mitigate it

    #[test]
    fn apply_massive_changeset() {
        let (mut chain, _) = LocalChain::from_genesis_hash(genesis_block(Network::Bitcoin).block_hash());
        let blocks = (1u32..850_000)
            .into_iter()
            .map(|height| {
                let hash = bitcoin::hashes::Hash::hash(height.to_string().as_bytes());
                (height, Some(hash))
            })
            .collect();
        chain.apply_changeset(&ChangeSet { blocks }).unwrap();
    }

@ValuedMammal ValuedMammal added this to BDK Oct 2, 2024
@ValuedMammal ValuedMammal moved this to Todo in BDK Oct 2, 2024
@ValuedMammal ValuedMammal added this to the 1.0.0-beta milestone Oct 2, 2024
@luisschwab
Copy link
Contributor

luisschwab commented Oct 3, 2024

I'm having some trouble reproducing this. Can you provide the command sequence you ran @ValuedMammal ?

btw, the crate's README needs some updating

@ValuedMammal
Copy link
Contributor Author

@luisschwab I usually add my descriptors as environment variables and then just follow the help text of the CLI. I'm interested to see if someone can reproduce it, otherwise it could be something specific to my architecture. You're right the README is outdated

# setup environment
export DESCRIPTOR="tr([7d94197e/86'/1'/0']tpubDCyQVJj8KzjiQsFjmb3KwECVXPvMwvAxxZGCP9XmWSopmjW3bCV3wD7TgxrUhiGSueDS1MU5X1Vb1YjYcp8jitXc5fXfdC1z68hDDEyKRNr/0/*)"
export CHANGE_DESCRIPTOR="tr([7d94197e/86'/1'/0']tpubDCyQVJj8KzjiQsFjmb3KwECVXPvMwvAxxZGCP9XmWSopmjW3bCV3wD7TgxrUhiGSueDS1MU5X1Vb1YjYcp8jitXc5fXfdC1z68hDDEyKRNr/1/*)"
# specific for example_bitcoind_rpc_polling
# assumes a full node setup locally on signet
export RPC_URL="127.0.0.1:38332"
export RPC_COOKIE=<path-to-your-cookie-file> # e.g. /home/satoshi/.bitcoin/signet/.cookie
export FALLBACK_HEIGHT=1

# initialize database
cargo run --bin example_bitcoind_rpc_polling -- init

# sync
cargo run --bin example_bitcoind_rpc_polling -- sync

@luisschwab
Copy link
Contributor

luisschwab commented Oct 3, 2024

Reproduced in v1.0.0-beta.5, Ubuntu 24.04.1 LTS x86_64.

[ 2517.2751s] synced to 00000067cf51b1e213711205dc8828a7933bd6bc0c51b4eecf8ca1921e7e3182 @ 215513 | total: 0.00069500 BTC
[ 2522.3696s] committed to db (took 0.013493876s)
[ 2523.3176s] synced to 0000007467101167136fcc53596a2ee6789cc9da74f43e3d1212c0ec778a117c @ 215775 | total: 0.00069500 BTC
[ 2529.3455s] synced to 0000008ef21a51707a38677076b5380950800af75fd35c8ac0cbe63ac5d87d89 @ 216012 | total: 0.00069500 BTC

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
$ cargo run --bin example_bitcoind_rpc_polling -- balance
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.34s
     Running `target/debug/example_bitcoind_rpc_polling balance`
confirmed:
    total             69500 sats
    spendable         69500 sats
    immature              0 sats
unconfirmed:
    total                 0 sats
    trusted               0 sats
    untrusted             0 sats

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

My DB is similarly large at 8MB.

$ du -sh .bdk_example_rpc.db 
8.0M	.bdk_example_rpc.db

@LLFourn
Copy link
Contributor

LLFourn commented Oct 11, 2024

Seems to be isolated to the CheckPoint type, but the question remains whether something should be done in bdk_bitcoind_rpc to mitigate it

    #[test]
    fn apply_massive_changeset() {
        let (mut chain, _) = LocalChain::from_genesis_hash(genesis_block(Network::Bitcoin).block_hash());
        let blocks = (1u32..850_000)
            .into_iter()
            .map(|height| {
                let hash = bitcoin::hashes::Hash::hash(height.to_string().as_bytes());
                (height, Some(hash))
            })
            .collect();
        chain.apply_changeset(&ChangeSet { blocks }).unwrap();
    }

This doesn't really isolate it to the CheckPoint type. It isolates it to the LocalChain type. The most likely explanation in my experience is that somehow this has become recursive. I guess the right thing to do is to use rust-gdb to figure out where this happens.

@ValuedMammal
Copy link
Contributor Author

I'm using this also as a more targeted litmus test

    let mut cp = CheckPoint::new(BlockId {
        height: 0,
        hash: hash!("g"),
    });
    let end = 10_000;
    for height in 1u32..end {
        let hash = bitcoin::hashes::Hash::hash(height.to_be_bytes().as_slice());
        let block = BlockId { height, hash };
        cp = cp.push(block).unwrap();
    }
    assert_eq!(cp.iter().count() as u32, end);

@ValuedMammal
Copy link
Contributor Author

Screen of my debugger showing a lot of drop calls. I experimented with writing a custom Drop impl for CheckPoint which did fix the test mentioned above, but when I tested the change on example_bitcoind_rpc_polling the performance was significantly slower to the point that I gave up syncing on more than one occasion, so I guess the jury is still out.

Screenshot 2024-11-15 at 2 33 04 PM

@LLFourn
Copy link
Contributor

LLFourn commented Nov 18, 2024

I think this is the solution: #1731. Perhaps that's already what you had in mind? I don't think this drop implementation could cause a performance issue.

can you test that for us @ValuedMammal? Thanks!

@ValuedMammal
Copy link
Contributor Author

I ran another sync on signet and this was the last print

[ 1603.0576s] synced to 000000fa810d2c90176b6837b50b0ea53cc0ec63b8e67eae2b1c36e94e0ade34 @ 222615 | total: 0.00069500 BTC

@luisschwab
Copy link
Contributor

Ran #1731 and did not experience any issues.

[ 3341.7742s] synced to 000000ac299519be580c8fb2e1aae0ae6c43cc910c58aff297b59dece1476db8 @ 222742 | total: 0.00069500 BTC

notmandatory added a commit that referenced this issue Nov 21, 2024
2df5861 test(core): test `Drop` implementation of `CPInner` (valued mammal)
67e1dc3 fix(core): Fix checkpoint Drop stack overflow (LLFourn)

Pull request description:

  Fixes #1634

  This needs a test that demonstrates the issue is fixed. I was hoping @ValuedMammal could do that for me.

ACKs for top commit:
  ValuedMammal:
    self-ACK 2df5861
  notmandatory:
    ACK 2df5861

Tree-SHA512: a431cb93505cbde2a9287de09d5faac003b8dfa01342cd22c6ca197d70a73948f94f55dfa365cc06b5be36f78458ed34d4ef5fa8c9e5e2989a21c7ce5b55d9ca
@github-project-automation github-project-automation bot moved this from In Progress to Done in BDK Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-blockchain
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants