-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent crash "FATAL SIGNAL 11 RECEIVED" after catch-up #1114
Comments
Another consistent
|
Just wanted to note that I did not have this crash happening before pulling the latest master. Just a note. Not sure if that is helpful or not... I did, however have two payments stuck as pending for weeks, and would see something along the lines of "name owned by none" when running the command "listpeers [node] debug | jq" and the channel state was saying ONCHAIND_OUR_UNILATERAL even thought the depth was well past 144. Depth was somewhere past 1000+.... maybe even 3000. I reconnected with the peer that had the stuck channel, and the "name owned by none" changed to "name owned by %6/" or something to that effect. Sorry if I am not saying it right but I didnt stop and save the output at the time and am having to go off memory. I should have saved it, but I was thinking pulling the latest might fix it the stuck channel, but it ended up doing the crash every time now. Next time I will save everything. Again, just trying to give whatever info I can remember from memory, sorry if it makes no sense, Im a noob. xD |
Now I'm starting to get catchup crashes. I reported in bug #1157. |
I would just like to make an observation: the call that is crashing is a call to Now, let us turn to the code in lightning/lightningd/peer_control.c Lines 533 to 537 in 35ce131
From this, we can consider some kind of memory corruption or overwriting.
|
Or maybe worse... Maybe we are freeing a non- |
I was able to get a similar crash by killing I am wondering if we need to set the channel owner to NULL here in the error callback. @@ -330,6 +330,7 @@ static void onchain_error(struct channel *channel,
{
/* FIXME: re-launch? */
log_broken(channel->log, "%s", desc);
+ channel_set_billboard(channel, true, desc);
+ channel_set_owner(channel, NULL);
} With this |
It does look like a use-after-free issue. Stepping through the backtrace, it can't be Could one of the users experiencing the problem try running with
This will exit with a return code of 7 when valgrind finds a memory error. |
@jBarz I highly suspect you are correct. Thank you! I have audited all four places where we use
|
@jBarz great catch, could you open a PR with those fixes? |
I've reopened: while we no longer crash, the onchaind failure shouldn't happen. Will see if I can diagnose that today. |
Got one confirmation of the crash being solved from a user. |
First, @sadilek Great bug report, thanks for your patience and I definitely owe you a beer (or equiv)! OK, so here's what's happening:
Now, this is a bug. One reason the first proposal and the second are different is because the fees are different, which makes perfect sense and we should handle. The other reason is that we're paying to a different address. That is a minor bug: if we get a fresh one for onchaind we don't write it to the db (which we should). I will now reproduce and fix both, and you'll get your 0.00245543 BTC or so back! |
The root cause of ElementsProject#1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
Thanks for your debugging! I synced to 31b9b6b yesterday, which includes @jBarz's #1164, and restarted the daemon. On my (slow) machine it takes a good 5h to catch up. I now see that the daemon crashed again right after catching up, but differently this time:
I now synced to dace9bf to include @rustyrussell's latest fixes and trying again. |
After syncing to dace9bf, the daemon crashed again the same way as in my last update:
|
However, Line 388 in dace9bf
We need to determine what If |
The root cause of ElementsProject#1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Closes: ElementsProject#1114 Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
The root cause of ElementsProject#1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Closes: ElementsProject#1114 Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
The root cause of ElementsProject#1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Closes: ElementsProject#1114 Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
The root cause of ElementsProject#1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Closes: ElementsProject#1114 Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
The root cause of #1114 was that the restarted onchaind created a different proposal to the one which had previously been mined: 2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 After the previous patches which fixed the output address difference, we could identify proposals by their outputs, but during the transition (onchaind started with old buggy version, restarted now) that wouldn't be right, so we match the inputs, discarding signatures which will be different. This works for all current cases. Closes: #1114 Signed-off-by: Rusty Russell <[email protected]>
…fees. This was revealed in #1114; onchaind isn't actually completely idempotent due to fee changes (and the now-fixed change in keys used). This triggers the bug by restarting with different fees, resulting in onchaind not recognizing its own proposal: 2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 Signed-off-by: Rusty Russell <[email protected]>
I didn't have time yet to help with further debugging this. But I just synced to f45f18c and am trying again... |
Still failing:
This is the same way as in my last two updates but different from my original report. Do you want me to file this as a new issue? |
I kept pulling the latest changes, and since I'm synced to 35e85ab, my node doesn't crash any longer after having caught-up. Thanks! |
My daemon is consistently crashing since a few days after having caught up with the blockchain. I have pulled the latest updates and rebuilt several times. Last crash was at commit 58fae47.
crash.log:
stdout:
getinfo:
The text was updated successfully, but these errors were encountered: