Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent crash "FATAL SIGNAL 11 RECEIVED" after catch-up #1114

Closed
sadilek opened this issue Feb 27, 2018 · 19 comments · Fixed by #1164 or #1171
Closed

Consistent crash "FATAL SIGNAL 11 RECEIVED" after catch-up #1114

sadilek opened this issue Feb 27, 2018 · 19 comments · Fixed by #1164 or #1171
Labels
Milestone

Comments

@sadilek
Copy link

sadilek commented Feb 27, 2018

My daemon is consistently crashing since a few days after having caught up with the blockchain. I have pulled the latest updates and rebuilt several times. Last crash was at commit 58fae47.

crash.log:

+14186.849070555 lightningd(9833):DEBUG: Adding block 511107: 0000000000000000001845d43505e86dde6bdbf55fce9b8325d9619a2200901c
+14189.756722369 lightning_gossipd(9841):DEBUG: Received channel_update for channel 510308:736:0(0) now ACTIVE
+14189.756828585 lightning_gossipd(9841):DEBUG: Received channel_update for channel 509559:411:0(0) now ACTIVE
+14189.756915298 lightning_gossipd(9841):DEBUG: Received channel_update for channel 506088:2414:0(0) now ACTIVE
+14189.756972321 lightning_gossipd(9841):DEBUG: Received channel_update for channel 507064:1108:1(1) now ACTIVE
+14189.830532269 lightningd(9833):BROKEN: FATAL SIGNAL 11 RECEIVED
+14189.841756223 lightningd(9833):BROKEN: backtrace: lightningd/log.c:504 (log_crash) 0x414f40
+14189.841841556 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0x7f0eba0d84af
+14189.841877372 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0x7f0eba0f1cc0
+14189.841900682 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0x7f0eba119a48
+14189.842038474 lightningd(9833):BROKEN: backtrace: ccan/ccan/tal/str/str.c:67 (do_vfmt) 0x46a61c
+14189.842068225 lightningd(9833):BROKEN: backtrace: ccan/ccan/tal/str/str.c:91 (tal_vfmt) 0x46a6dc
+14189.842103494 lightningd(9833):BROKEN: backtrace: lightningd/log.c:264 (logv) 0x41435a
+14189.842128505 lightningd(9833):BROKEN: backtrace: lightningd/log.c:320 (log_) 0x4146da
+14189.842471909 lightningd(9833):BROKEN: backtrace: lightningd/peer_control.c:534 (funding_lockin_cb) 0x421a16
+14189.842679274 lightningd(9833):BROKEN: backtrace: lightningd/watch.c:221 (txw_fire) 0x429d4f
+14189.842701448 lightningd(9833):BROKEN: backtrace: lightningd/watch.c:289 (watch_topology_changed) 0x429fb3
+14189.842977929 lightningd(9833):BROKEN: backtrace: lightningd/chaintopology.c:335 (updates_complete) 0x406ed1
+14189.842997248 lightningd(9833):BROKEN: backtrace: lightningd/chaintopology.c:429 (get_new_block) 0x407346
+14189.843299464 lightningd(9833):BROKEN: backtrace: lightningd/bitcoind.c:657 (process_getblockhash) 0x405535
+14189.843319911 lightningd(9833):BROKEN: backtrace: lightningd/bitcoind.c:186 (bcli_finished) 0x403fb7
+14189.843485216 lightningd(9833):BROKEN: backtrace: ccan/ccan/io/poll.c:183 (destroy_conn) 0x4610b1
+14189.843503984 lightningd(9833):BROKEN: backtrace: ccan/ccan/io/poll.c:189 (destroy_conn_close_fd) 0x4610d1
+14189.843773033 lightningd(9833):BROKEN: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x46b720
+14189.843791471 lightningd(9833):BROKEN: backtrace: ccan/ccan/tal/tal.c:400 (del_tree) 0x46bbd1
+14189.843829697 lightningd(9833):BROKEN: backtrace: ccan/ccan/tal/tal.c:509 (tal_free) 0x46bfe4
+14189.844019283 lightningd(9833):BROKEN: backtrace: ccan/ccan/io/io.c:443 (io_close) 0x45fd39
+14189.844038381 lightningd(9833):BROKEN: backtrace: ccan/ccan/io/poll.c:314 (io_loop) 0x4614c0
+14189.844248188 lightningd(9833):BROKEN: backtrace: lightningd/lightningd.c:403 (main) 0x413644
+14189.844266654 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0x7f0eba0c382f
+14189.844371345 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0x4033f8
+14189.844388414 lightningd(9833):BROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff

stdout:

2018-02-27T05:55:47.283Z lightningd(9833): 0273081ce642554d5a68a5236564fe88a3783457dc09e52812517f5724f36daefd chan #89: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2018-02-27T05:59:00.167Z lightningd(9833): FATAL SIGNAL 11 RECEIVED
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: lightningd/log.c:504 (log_crash) 0x414f40
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: (null):0 ((null)) 0x7f0eba0d84af
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: (null):0 ((null)) 0x7f0eba0f1cc0
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: (null):0 ((null)) 0x7f0eba119a48
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: ccan/ccan/tal/str/str.c:67 (do_vfmt) 0x46a61c
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: ccan/ccan/tal/str/str.c:91 (tal_vfmt) 0x46a6dc
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: lightningd/log.c:264 (logv) 0x41435a
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: lightningd/log.c:320 (log_) 0x4146da
2018-02-27T05:59:00.178Z lightningd(9833): backtrace: lightningd/peer_control.c:534 (funding_lockin_cb) 0x421a16
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/watch.c:221 (txw_fire) 0x429d4f
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/watch.c:289 (watch_topology_changed) 0x429fb3
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/chaintopology.c:335 (updates_complete) 0x406ed1
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/chaintopology.c:429 (get_new_block) 0x407346
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/bitcoind.c:657 (process_getblockhash) 0x405535
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: lightningd/bitcoind.c:186 (bcli_finished) 0x403fb7
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: ccan/ccan/io/poll.c:183 (destroy_conn) 0x4610b1
2018-02-27T05:59:00.179Z lightningd(9833): backtrace: ccan/ccan/io/poll.c:189 (destroy_conn_close_fd) 0x4610d1
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x46b720
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: ccan/ccan/tal/tal.c:400 (del_tree) 0x46bbd1
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: ccan/ccan/tal/tal.c:509 (tal_free) 0x46bfe4
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: ccan/ccan/io/io.c:443 (io_close) 0x45fd39
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: ccan/ccan/io/poll.c:314 (io_loop) 0x4614c0
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: lightningd/lightningd.c:403 (main) 0x413644
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: (null):0 ((null)) 0x7f0eba0c382f
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: (null):0 ((null)) 0x4033f8
2018-02-27T05:59:00.180Z lightningd(9833): backtrace: (null):0 ((null)) 0xffffffffffffffff
Fatal signal 11. 0x415026 log_crash
        lightningd/log.c:530
0x7f0eba0d84af ???
        ???:0
0x7f0eba0f1cc0 ???
        ???:0
0x7f0eba119a48 ???
        ???:0
0x46a61c do_vfmt
        ccan/ccan/tal/str/str.c:67
0x46a6dc tal_vfmt
        ccan/ccan/tal/str/str.c:91
0x41435a logv
        lightningd/log.c:264
0x4146da log_
        lightningd/log.c:320
0x421a16 funding_lockin_cb
        lightningd/peer_control.c:534
0x429d4f txw_fire
        lightningd/watch.c:221
0x429fb3 watch_topology_changed
        lightningd/watch.c:289
0x406ed1 updates_complete
        lightningd/chaintopology.c:335
0x407346 get_new_block
        lightningd/chaintopology.c:429
0x405535 process_getblockhash
        lightningd/bitcoind.c:657
0x403fb7 bcli_finished
        lightningd/bitcoind.c:186
0x4610b1 destroy_conn
        ccan/ccan/io/poll.c:183
0x4610d1 destroy_conn_close_fd
        ccan/ccan/io/poll.c:189
0x46b720 notify
        ccan/ccan/tal/tal.c:240
0x46bbd1 del_tree
        ccan/ccan/tal/tal.c:400
0x46bfe4 tal_free
        ccan/ccan/tal/tal.c:509
0x45fd39 io_close
        ccan/ccan/io/io.c:443
0x4614c0 io_loop
        ccan/ccan/io/poll.c:314
0x413644 main
        lightningd/lightningd.c:403
0x7f0eba0c382f ???
        ???:0
0x4033f8 ???
        ???:0
0xffffffffffffffff ???
        ???:0
Log dumped in crash.log

getinfo:

{ "id" : "036487e0bd981617c674c2856fc0050472a0182838e0a4a1a67545c6cdee9f49a0", "port" : 9735, "address" : 
	[ 
		{ "type" : "ipv4", "address" : "81.6.40.158", "port" : 9735 } ], "version" : "v0.5.2-2016-11-21-2125-g58fae47", "blockheight" : 510777, "network" : "bitcoin" }
@ZmnSCPxj
Copy link
Contributor

Possibly related: #1103

The crash in this issue points to code that reads channel->owner->name, similar to the code that is crashing in #1103. This seems to suggest more and more a corruption or similar of the name field.

@smabbutt
Copy link

smabbutt commented Mar 1, 2018

Another consistent FATAL SIGNAL 11 RECEIVED crash , after STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3 . Running the latest cdecker/lightningd image behind an up-to-date bitcoind full node. Setup has been stable, including a recent failed payment, now crashes on every startup. Stdout below, full crash.log at https://goo.gl/2GMMtA

root@ubu17vm:/scratch/bitcoin/mainnet/clightning# docker run --rm --name lightning --network container:bitcoind_mainnet -v /scratch/bitcoin/mainnet/bitcoind:/root/.bitcoin -v /scratch/bitcoin/mainnet/clightning:/root/.lightning --entrypoint /usr/bin/lightningd cdecker/lightningd:latest --network=bitcoin --log-level=debug
2018-03-01T09:39:56.898Z lightningd(1): Not guessing addresses: manually set
2018-03-01T09:39:57.118Z lightningd(1): testing /usr/bin/lightning_channeld
2018-03-01T09:39:57.216Z lightningd(1): testing /usr/bin/lightning_closingd
2018-03-01T09:39:57.300Z lightningd(1): testing /usr/bin/lightning_gossipd
2018-03-01T09:39:57.376Z lightningd(1): testing /usr/bin/lightning_hsmd
2018-03-01T09:39:57.453Z lightningd(1): testing /usr/bin/lightning_onchaind
2018-03-01T09:39:57.543Z lightningd(1): testing /usr/bin/lightning_openingd
2018-03-01T09:39:58.502Z lightningd(1): Client: Received message 11 from client
2018-03-01T09:39:58.511Z lightningd(1): Loaded 0 invoices from DB
2018-03-01T09:39:58.512Z lightningd(1): Client: Received message 9 from client
2018-03-01T09:39:58.512Z lightning_gossipd(14): pid 14, msgfd 13
2018-03-01T09:39:58.512Z lightningd(1): Loaded 1 HTLC signatures from DB
2018-03-01T09:39:58.512Z lightningd(1): Loaded 1 channels from DB
2018-03-01T09:39:58.512Z lightningd(1): Loading HTLCs for channel 1
2018-03-01T09:39:58.513Z lightningd(1): Restored 0 incoming and 1 outgoing HTLCS
2018-03-01T09:39:59.142Z lightning_gossipd(14): req: type WIRE_GOSSIPCTL_INIT len 126
2018-03-01T09:39:59.142Z lightning_gossipd(14): Creating IPv6 listener on port 9735
2018-03-01T09:39:59.585Z lightningd(1): Adding block 511072: 00000000000000000016d355176805b5aecbb6bb44114034e5069835dd48ebe9
2018-03-01T09:39:59.585Z lightningd(1): Listening on 'lightning-rpc'
2018-03-01T09:39:59.585Z lightningd(1): Server started with public key 02786178a2a8e75a6e882a0f56b831330a61728d093b33e0fb3ec2d8a8125ffdf2, alias PON_Z (color #027861) and lightningd v0.5.2-2016-11-21-2092-g7c8883a
2018-03-01T09:39:59.586Z lightning_gossipd(14): req: type WIRE_GOSSIPCTL_PEER_ADDRHINT len 42
2018-03-01T09:40:00.154Z lightningd(1): Adding block 511073: 0000000000000000003992580a8a1f6d6ab49fdf9aece94d2abeee21fca05fe2
2018-03-01T09:40:00.224Z lightningd(1): Immediate feerate 7071 (was 0)
2018-03-01T09:40:00.224Z lightningd(1): Normal feerate 6762 (was 0)
2018-03-01T09:40:00.224Z lightningd(1): Slow feerate 294 (was 0)
2018-03-01T09:40:00.556Z lightningd(1): Adding block 511074: 00000000000000000024daaf32bb7ab9e81d9b9bc0f56bc9e76e7fae7d3b5dcf
...
2018-03-01T09:40:52.498Z lightningd(1): Adding block 511174: 00000000000000000054d095f4630802232e050c64f800e987101dad795f206f
2018-03-01T09:40:52.549Z lightningd(1): Owning output 0 7669 (SEGWIT) txid 28cad7303c12cb129a81ce4afece811f34b8776a5c91eeb1c5f0f0393f3cbd8f
2018-03-01T09:40:53.304Z lightningd(1): Adding block 511175: 0000000000000000000046f58239416342f3c9265d2dec327058b198555f9df7
2018-03-01T09:40:53.817Z lightningd(1): Adding block 511176: 0000000000000000001156f3d5b4aac51b10eb9090f488be5a54bd12b1c20cfa
2018-03-01T09:40:54.672Z lightningd(1): Adding block 511177: 0000000000000000004b049bdffd3982fa669f8567c2dd0088bae4660fd185bf
2018-03-01T09:40:54.727Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Got UTXO spend for 28cad7303c12cb129a81ce4afece811f34b8776a5c91eeb1c5f0f0393f3cbd8f:1: 195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02
2018-03-01T09:40:54.727Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Peer permanent failure in ONCHAIN: Funding transaction spent
2018-03-01T09:40:54.727Z lightningd(1):          (tx 2241a4c7634b5188a6bf6bc61cd926a81daece437be61c46467b04bdc18f743d)
2018-03-01T09:40:54.727Z lightningd(1): sendrawtransaction: 020000000001018fbd3c3f39f0f0c5b1ee915c6a77b8341f81cefe4ace819a12cb123c30d7ca2801000000005590da80023fcf0000000000002200206add3ba5fee4677ed6a856ef40d6a1cd246f0e9c2a7dcad40ed420925854a90dfcc1030000000000220020aa2992121527cf39e976668a6760acd6317c890259497cf889bf44769e44cf93040047304402200c3384bfa8ced08122055e596cb2fa7c663ca643a18877fe6ff161c081c56b2402202c2f9d35c27ea464819839f9d4a1d5c0a2295b10b850b6765588b4c43fe1b2b80147304402207ae4bf0523020ab5d3187fff65721da159197329135e6397706a0f721901697002203fb8612a2d836679074431e13b01d31a67838260ce6b322be2834780fe74f1a7014752210264f866df20230d4896bba59c7cbf9d9a23b818f3d9572b1b2f91086fdc88b7f621036f610e5a39ed53073879a23f4782cd44fb8ffa252824512bdcf36be17d4ec81d52ae62cc9420
2018-03-01T09:40:54.729Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2018-03-01T09:40:54.729Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: pid 236, msgfd 17
2018-03-01T09:40:54.730Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: We want to know if htlc 0 is missing (immediate)
2018-03-01T09:40:55.045Z lightning_gossipd(14): req: type WIRE_GOSSIP_DISABLE_CHANNEL len 12
2018-03-01T09:40:55.045Z lightning_gossipd(14): Unable to find channel 511174:864:1/1
2018-03-01T09:40:55.110Z lightningd(1): sendrawtx exit 25, gave error code: -25?error message:?Missing inputs?
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Tracking output 1 of 28cad7303c12cb129a81ce4afece811f34b8776a5c91eeb1c5f0f0393f3cbd8f: FUNDING_TRANSACTION/FUNDING_OUTPUT
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Remote per-commit point: 02b4919ed6a7371b8ad6dbca0013bdff6e744542733b7af41841fab20fc2d7c164
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Old remote per-commit point: 03079cc61c953e9648cbce71ece1c91dfd27f5fc0bb0ee571cb9f0d3cbea0784bf
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: commitnum = 1, revocations_received = 1
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Their unilateral tx, old commit point
2018-03-01T09:40:55.164Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: UPDATE WIRE_ONCHAIN_INIT_REPLY
2018-03-01T09:40:55.165Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2018-03-01T09:40:55.741Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Initial feerate 0 to 2314
2018-03-01T09:40:55.741Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by THEIR_UNILATERAL (195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02)
2018-03-01T09:40:55.742Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Deriving keyset 1: per_commit_point=03079cc61c953e9648cbce71ece1c91dfd27f5fc0bb0ee571cb9f0d3cbea0784bf self_payment_basepoint=02b42d6cfde33b6506f2ecbc81f363335a970ea98e191bcffe3f12a38da7dcd5e7 other_payment_basepoint=023d3f1f0afb5abe2962dfc0a778d0cd60108df0c2f0aa9ca18b72c2483bd32a04 self_htlc_basepoint=038cbe5f2f2ed0702b83556c1969d3c50097af95ed136dfa7e04e705da34c43af0 other_htlc_basepoint=0211837debf108a08364fc79e66d3855b41cbe1d120897cdcd350018ecb2a4dac6 self_delayed_basepoint=021772c68c9dc546f3b3db0c9633ddeff887790cf6206d5b5cb78ccc464bebccd8 other_revocation_basepoint=03482c7e949688f461e4c28c04b69b3823fca085b02afdce671478fb1370ecab37
2018-03-01T09:40:55.742Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Deconstructing unilateral tx: 1 using keyset:  self_revocation_key: 0329ec6d457d8ef1a825ab411594c78404565afe0fa6e39b4a1ca87f9108a2f60c self_delayed_payment_key: 0248673b2e6a1a9a14ca82c6b9e2900dba1347d2a4eab4873f896d0fc44698f105 self_payment_key: 037701fb4a634720b958f235b2b75020ebbbd79fce9cfb07c6373d05095a4c39ba other_payment_key: 02b4cc9be6edcd9b96191431497c5ffd023c437b5b929f1f897a56b6ad40ddf03d self_htlc_key: 039d64641cd75a223f0047ab47c43a7dcc1d59b4998a27bd48117be2dbee4416ca other_htlc_key: 020b5f344c9f068be9a6cc81534180f8d2b63a2f00cbddbaac00695a590e067214
2018-03-01T09:40:55.742Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Script to-them: 144: 002095e2bf394cc4be60b07565899f3acb6c5a4c4d6a05ccab44fd00312a7a967c1f (63210329ec6d457d8ef1a825ab411594c78404565afe0fa6e39b4a1ca87f9108a2f60c67029000b275210248673b2e6a1a9a14ca82c6b9e2900dba1347d2a4eab4873f896d0fc44698f10568ac)
2018-03-01T09:40:55.743Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Script to-me: 001410573c5ba80852d34bca2386d35e01caa7b0f74f
2018-03-01T09:40:55.743Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Output 0: 001410573c5ba80852d34bca2386d35e01caa7b0f74f
2018-03-01T09:40:55.744Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Output 1: 0020c971f590578906bcb6983424283d2fb6964d692b9c56af07412460edbe14a3d6
2018-03-01T09:40:55.744Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Tracking output 0 of 195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02: THEIR_UNILATERAL/OUTPUT_TO_US
2018-03-01T09:40:55.745Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Ignoring output 0 of 195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02: THEIR_UNILATERAL/OUTPUT_TO_US
2018-03-01T09:40:55.745Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: UPDATE WIRE_ONCHAIN_ADD_UTXO
2018-03-01T09:40:55.746Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Tracking output 1 of 195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02: THEIR_UNILATERAL/OUR_HTLC
2018-03-01T09:40:55.747Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Propose handling THEIR_UNILATERAL/OUR_HTLC by OUR_HTLC_TIMEOUT_TO_US (0200000000010102dadad0924d0fe508147664a23486420a7a47d8a9e80919b168b2cf49035e19010000000000000000019eb3030000000000160014dec0e8970a8c56d16061b686cc9d0271360c973803483045022100c9377a7cf9cda8cd9620454c90cc69a931a1e72344356e019c8c3742dcd8991602200fa19921aed6a2645013736e47756cf7025339c240de75dde58cf5e569c88c1701008b76a9143e37cbbb39a588ac16c0e6c7c7bf8f75971124c78763ac6721020b5f344c9f068be9a6cc81534180f8d2b63a2f00cbddbaac00695a590e0672147c8201208763a9141badf325c619ded6373bc773165cf271121e339788527c21039d64641cd75a223f0047ab47c43a7dcc1d59b4998a27bd48117be2dbee4416ca52ae677503dfcc07b175ac6868dfcc0700) in 22 blocks
2018-03-01T09:40:55.981Z lightningd(1): Adding block 511178: 0000000000000000001e54bbb2fee0413f53e245a1ef1f1be039719b1daf1ecc
...
2018-03-01T09:41:08.808Z lightningd(1): Adding block 511202: 000000000000000000196a873a09204adef35c8777367f180ee7570ddadcc4e2
2018-03-01T09:41:08.855Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Got UTXO spend for 195e0349cfb268b11909e8a9d8477a0a428634a264761408e50f4d92d0dada02:1: 0470f02a5e69d4010a9df69f5d3b286841fbb877a69b20464c080e2cd41c8984
2018-03-01T09:41:08.855Z lightningd(1): Owning output 0 245543 (SEGWIT) txid 0470f02a5e69d4010a9df69f5d3b286841fbb877a69b20464c080e2cd41c8984
2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Got new message WIRE_ONCHAIN_SPENT
2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3
2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Status closed, but not exited. Killing
2018-03-01T09:41:08.884Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Owning subdaemon lightning_onchaind died (61952)
2018-03-01T09:41:09.145Z lightningd(1): Adding block 511203: 0000000000000000004dfd2f9f85ab165ff0f3ad19766a96e2e3301e0beda719
...
2018-03-01T09:43:35.137Z lightningd(1): Adding block 511446: 00000000000000000035bdd76f561499a60b1bed263af938bc9552c3b4775b33
2018-03-01T09:43:35.259Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Got depth change 0->273 for 28cad7303c12cb129a81ce4afece811f34b8776a5c91eeb1c5f0f0393f3cbd8f
2018-03-01T09:43:35.281Z lightningd(1): 020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: Funding tx 28cad7303c12cb129a81ce4afece811f34b8776a5c91eeb1c5f0f0393f3cbd8f depth 273 of 3
2018-03-01T09:43:35.282Z lightningd(1): FATAL SIGNAL 11 RECEIVED
2018-03-01T09:43:35.341Z lightningd(1): backtrace: lightningd/log.c:504 (log_crash) 0x414647
2018-03-01T09:43:35.341Z lightningd(1): backtrace: (null):0 ((null)) 0x7f58eba994af
2018-03-01T09:43:35.341Z lightningd(1): backtrace: (null):0 ((null)) 0x7f58ebab2cc0
2018-03-01T09:43:35.341Z lightningd(1): backtrace: (null):0 ((null)) 0x7f58ebadaa48
2018-03-01T09:43:35.341Z lightningd(1): backtrace: ccan/ccan/tal/str/str.c:67 (do_vfmt) 0x46d500
2018-03-01T09:43:35.342Z lightningd(1): backtrace: ccan/ccan/tal/str/str.c:91 (tal_vfmt) 0x46d5c0
2018-03-01T09:43:35.342Z lightningd(1): backtrace: lightningd/log.c:264 (logv) 0x413a61
2018-03-01T09:43:35.342Z lightningd(1): backtrace: lightningd/log.c:320 (log_) 0x413de1
2018-03-01T09:43:35.343Z lightningd(1): backtrace: lightningd/peer_control.c:534 (funding_lockin_cb) 0x4211aa
2018-03-01T09:43:35.343Z lightningd(1): backtrace: lightningd/watch.c:221 (txw_fire) 0x429923
2018-03-01T09:43:35.343Z lightningd(1): backtrace: lightningd/watch.c:289 (watch_topology_changed) 0x429b87
2018-03-01T09:43:35.344Z lightningd(1): backtrace: lightningd/chaintopology.c:335 (updates_complete) 0x406cb7
2018-03-01T09:43:35.344Z lightningd(1): backtrace: lightningd/chaintopology.c:429 (get_new_block) 0x40712c
2018-03-01T09:43:35.344Z lightningd(1): backtrace: lightningd/bitcoind.c:630 (process_getblockhash) 0x405331
2018-03-01T09:43:35.344Z lightningd(1): backtrace: lightningd/bitcoind.c:173 (bcli_finished) 0x403e3f
2018-03-01T09:43:35.345Z lightningd(1): backtrace: ccan/ccan/io/poll.c:183 (destroy_conn) 0x463d1c
2018-03-01T09:43:35.345Z lightningd(1): backtrace: ccan/ccan/io/poll.c:189 (destroy_conn_close_fd) 0x463d3c
2018-03-01T09:43:35.345Z lightningd(1): backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x46e604
2018-03-01T09:43:35.345Z lightningd(1): backtrace: ccan/ccan/tal/tal.c:400 (del_tree) 0x46eab5
2018-03-01T09:43:35.346Z lightningd(1): backtrace: ccan/ccan/tal/tal.c:509 (tal_free) 0x46eec8
2018-03-01T09:43:35.346Z lightningd(1): backtrace: ccan/ccan/io/io.c:443 (io_close) 0x4629a4
2018-03-01T09:43:35.346Z lightningd(1): backtrace: ccan/ccan/io/poll.c:314 (io_loop) 0x46412b
2018-03-01T09:43:35.347Z lightningd(1): backtrace: lightningd/lightningd.c:403 (main) 0x412d4b
2018-03-01T09:43:35.347Z lightningd(1): backtrace: (null):0 ((null)) 0x7f58eba8482f
2018-03-01T09:43:35.347Z lightningd(1): backtrace: (null):0 ((null)) 0x403318
2018-03-01T09:43:35.347Z lightningd(1): backtrace: (null):0 ((null)) 0xffffffffffffffff
Fatal signal 11. 0x41472d log_crash
        lightningd/log.c:530
0x7f58eba994af ???
        ???:0
0x7f58ebab2cc0 ???
        ???:0
0x7f58ebadaa48 ???
        ???:0
0x46d500 do_vfmt
        ccan/ccan/tal/str/str.c:67
0x46d5c0 tal_vfmt
        ccan/ccan/tal/str/str.c:91
0x413a61 logv
        lightningd/log.c:264
0x413de1 log_
        lightningd/log.c:320
0x4211aa funding_lockin_cb
        lightningd/peer_control.c:534
0x429923 txw_fire
        lightningd/watch.c:221
0x429b87 watch_topology_changed
        lightningd/watch.c:289
0x406cb7 updates_complete
        lightningd/chaintopology.c:335
0x40712c get_new_block
        lightningd/chaintopology.c:429
0x405331 process_getblockhash
        lightningd/bitcoind.c:630
0x403e3f bcli_finished
        lightningd/bitcoind.c:173
0x463d1c destroy_conn
        ccan/ccan/io/poll.c:183
0x463d3c destroy_conn_close_fd
        ccan/ccan/io/poll.c:189
0x46e604 notify
        ccan/ccan/tal/tal.c:240
0x46eab5 del_tree
        ccan/ccan/tal/tal.c:400
0x46eec8 tal_free
        ccan/ccan/tal/tal.c:509
0x4629a4 io_close
        ccan/ccan/io/io.c:443
0x46412b io_loop
        ccan/ccan/io/poll.c:314
0x412d4b main
        lightningd/lightningd.c:403
0x7f58eba8482f ???
        ???:0
0x403318 ???
        ???:0
0xffffffffffffffff ???
        ???:0
Log dumped in crash.log

@cdecker cdecker added the bug label Mar 1, 2018
@cdecker cdecker added this to the v0.6 milestone Mar 1, 2018
@CryptoChronicus
Copy link

CryptoChronicus commented Mar 2, 2018

Just wanted to note that I did not have this crash happening before pulling the latest master. Just a note. Not sure if that is helpful or not...

I did, however have two payments stuck as pending for weeks, and would see something along the lines of "name owned by none" when running the command "listpeers [node] debug | jq" and the channel state was saying ONCHAIND_OUR_UNILATERAL even thought the depth was well past 144. Depth was somewhere past 1000+.... maybe even 3000.

I reconnected with the peer that had the stuck channel, and the "name owned by none" changed to "name owned by %6/" or something to that effect.

Sorry if I am not saying it right but I didnt stop and save the output at the time and am having to go off memory. I should have saved it, but I was thinking pulling the latest might fix it the stuck channel, but it ended up doing the crash every time now. Next time I will save everything.

Again, just trying to give whatever info I can remember from memory, sorry if it makes no sense, Im a noob. xD

@Ulmo
Copy link

Ulmo commented Mar 3, 2018

Now I'm starting to get catchup crashes. I reported in bug #1157.

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented Mar 3, 2018

I would just like to make an observation: the call that is crashing is a call to tal_fmt which is just a wrapper around sprintf of the platform. As #1118 observed, a NULL pointer passed to "%s" is often rendered as (null) on glibc systems, which is likely the system C library in almost all users.

Now, let us turn to the code in peer_control.c that is in the backtrace:

if (!channel_ready) {
log_debug(channel->log,
"Funding tx confirmed, but channel state %s %s",
channel_state_name(channel),
channel->owner ? channel->owner->name : "unowned");

channel->owner cannot be NULL, as then the string passed would be the constant "unowned" and we would not have crashed. channel->owner must be an accessible object (not necessarily a valid one but at least not something that would cause a segfault), as otherwise the segfault would have occurred in funding_lockin_cb directly instead of deep inside tal_fmt. And channel->owner->name must be non-NULL, because if it were NULL then glibc would have rendered it as (null).

From this, we can consider some kind of memory corruption or overwriting.

  1. Perhaps some code casts a struct subd * to some other type and overwrites the name field.
  2. Perhaps some code frees the struct subd * but fails to clear the owner field. Then another object is allocated in the same memory area and writes its own data. By chance the data in the name field is written with values that are not all 0, leading to our symptoms.

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented Mar 3, 2018

Or maybe worse... Maybe we are freeing a non-struct subd object but retain a pointer to that object, then the memory is reallocated for the struct subd, but we use the old freed pointer and it happens to overwrite name....hmmm

@jBarz
Copy link
Contributor

jBarz commented Mar 3, 2018

I was able to get a similar crash by killing onchaind and then running listpeers soon after.

I am wondering if we need to set the channel owner to NULL here in the error callback.
After this, the subd object is destroyed so the channel will have a reference to a destroyed subd?

@@ -330,6 +330,7 @@ static void onchain_error(struct channel *channel,
 {
        /* FIXME: re-launch? */
        log_broken(channel->log, "%s", desc);
+       channel_set_billboard(channel, true, desc);
+       channel_set_owner(channel, NULL);
 }

With this listpeers won't crash and we can see this on the channel billboard.
"ONCHAIN:Owning subdaemon lightning_onchaind died (9)"

@cdecker
Copy link
Member

cdecker commented Mar 3, 2018

It does look like a use-after-free issue. Stepping through the backtrace, it can't be channel->log being broken, since that'd die one log.c:262, instead it dies when attempting to access channel->owner.

Could one of the users experiencing the problem try running with valgrind:

valgrind -q --trace-children=yes --leak-check=full --trace-children-skip=*bitcoin-cli* --error-exitcode=7 --log-file=valgrind-errors.%p lightningd/lightningd --log-level=debug

This will exit with a return code of 7 when valgrind finds a memory error.

Ping @Ulmo @CryptoChronicus @smabbutt @sadilek

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented Mar 4, 2018

@jBarz I highly suspect you are correct. Thank you!

I have audited all four places where we use new_channel_subd.

  1. In lightningd/channel_control.c and lightningd/closing_control.c, the error handler is channel_errmsg, which correctly clears the channel->owner.
  2. In lightningd/onchain_control.c, the error handler is onchain_error, which as seen above does not clear the channel->owner.
  3. In lightningd/opening_control.c, the error handler is opening_channel_errmsg. This is something of a special case as the structure is different in opening (it is a struct uncommited_channel), and it frees the uncommitted channel, meaning that changing its owner is moot.

@cdecker
Copy link
Member

cdecker commented Mar 4, 2018

@jBarz great catch, could you open a PR with those fixes?

@rustyrussell
Copy link
Contributor

I've reopened: while we no longer crash, the onchaind failure shouldn't happen. Will see if I can diagnose that today.

@cdecker
Copy link
Member

cdecker commented Mar 4, 2018

Got one confirmation of the crash being solved from a user.

@rustyrussell
Copy link
Contributor

First, @sadilek Great bug report, thanks for your patience and I definitely owe you a beer (or equiv)!

OK, so here's what's happening:

  1. They do a unilateral close. One output is an HTLC we sent.
  2. We propose to resolve this by spending to our own address once it's timed out (at block 511199).
  3. We publish that.
  4. Something happens, and we restart.
  5. We replay, and propose a different tx.
  6. We see the spend we didn't propose (different txid) and get really upset.

Now, this is a bug. One reason the first proposal and the second are different is because the fees are different, which makes perfect sense and we should handle.

The other reason is that we're paying to a different address. That is a minor bug: if we get a fresh one for onchaind we don't write it to the db (which we should).

I will now reproduce and fix both, and you'll get your 0.00245543 BTC or so back!

rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 5, 2018
The root cause of ElementsProject#1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 5, 2018
…fees.

This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
@sadilek
Copy link
Author

sadilek commented Mar 5, 2018

Thanks for your debugging! I synced to 31b9b6b yesterday, which includes @jBarz's #1164, and restarted the daemon. On my (slow) machine it takes a good 5h to catch up. I now see that the daemon crashed again right after catching up, but differently this time:

+19160.585491986 lightningd(905):DEBUG: Adding block 512096: 0000000000000000004e8762067ec46314a1edd0d6905e42c4391c0646cd1e05
+19165.458807994 lightning_gossipd(914):DEBUG: peer_out WIRE_INIT
+19165.458876397 lightning_gossipd(914):DEBUG: peer_in WIRE_INIT
+19165.458940432 lightning_gossipd(914):DEBUG: UPDATE WIRE_GOSSIP_PEER_CONNECTED
+19165.459019900 lightning_gossipd(914):DEBUG: UPDATE WIRE_GOSSIP_PEER_CONNECTED
+19165.459127329 lightningd(905):DEBUG: Client: Received message 1 from client
+19165.459353848 lightningd(905):DEBUG: Client: Received message 9 from client
+19165.463467990 lightning_gossipd(914):DEBUG: Received channel_update for channel 508314:1679:0(0) now ACTIVE
+19165.463597171 lightning_gossipd(914):DEBUG: Received channel_update for channel 508530:1408:0(0) now ACTIVE
+19165.675780662 lightningd(905):DEBUG: sendrawtransaction: 020000000001013e5269a1410afe868da93922f34331ee0b39e84447d31bd385e053359d4cb0300100000000
f4bdee80013e41020000000000160014755bbf1f2f2d8f2b156ff0889abd10d52c91ab5f0400483045022100ded544124dbbb824034e99955c4d077be56cc07067630eb3b776e0d7727f
375c02200ea98bef98cc042f579ee069672e4351b5b1cfe19619defc2d3c79814c74effe0147304402203b7caa18d20c6f2ec4841712fe6893d47c255bbf248b86ac8cda7bae99e7225f
022031475265316d8cd87c9bf4873faf82653725aaf2c80274a205f0a78ac59949320147522102d4da6b235a04f172a8c8c4cb1246300e0d9b8cbba209376e49cb338fb9a2d4ed2103f4
2fae489b1cf98c0314528033b0b2ace9ab13c5a8af830d8a8696962f4c6ee552ae8b1c2c20
+19165.690665483 lightning_gossipd(914):DEBUG: Forgetting remote peer 0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd
+19165.704476016 lightningd(905):INFO: lightning_channeld-0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd chan #33: Peer connecti
on lost
+19165.704693858 lightningd(905):INFO: 0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd chan #33: Peer transient failure in CHANNE
LD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
+19165.705517875 lightning_gossipd(914):DEBUG: req: type WIRE_GOSSIPCTL_REACH_PEER len 35
+19165.706828321 lightningd(905):BROKEN: FATAL SIGNAL 11 RECEIVED
+19165.720037345 lightningd(905):BROKEN: backtrace: lightningd/log.c:504 (log_crash) 0x415442
+19165.720114865 lightningd(905):BROKEN: backtrace: (null):0 ((null)) 0x7f7c4e62c4af
+19165.720405262 lightningd(905):BROKEN: backtrace: lightningd/pay.c:388 (payment_failed) 0x41de70
+19165.720817762 lightningd(905):BROKEN: backtrace: lightningd/peer_htlcs.c:811 (onchain_failed_our_htlc) 0x4259ec
+19165.720997545 lightningd(905):BROKEN: backtrace: lightningd/onchain_control.c:202 (handle_onchain_htlc_timeout) 0x417401
+19165.721018802 lightningd(905):BROKEN: backtrace: lightningd/onchain_control.c:266 (onchain_msg) 0x4176b1
+19165.721278383 lightningd(905):BROKEN: backtrace: lightningd/subd.c:509 (sd_msg_read) 0x4290c5
+19165.721440199 lightningd(905):BROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x45fc4b
+19165.721465990 lightningd(905):BROKEN: backtrace: ccan/ccan/io/io.c:387 (do_plan) 0x460722
+19165.721481195 lightningd(905):BROKEN: backtrace: ccan/ccan/io/io.c:397 (io_ready) 0x460760
+19165.721608380 lightningd(905):BROKEN: backtrace: ccan/ccan/io/poll.c:310 (io_loop) 0x4620aa
+19165.721798959 lightningd(905):BROKEN: backtrace: lightningd/lightningd.c:405 (main) 0x413b46
+19165.721819089 lightningd(905):BROKEN: backtrace: (null):0 ((null)) 0x7f7c4e61782f
+19165.721833030 lightningd(905):BROKEN: backtrace: (null):0 ((null)) 0x4033f8
+19165.721859482 lightningd(905):BROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff

I now synced to dace9bf to include @rustyrussell's latest fixes and trying again.

@sadilek
Copy link
Author

sadilek commented Mar 5, 2018

After syncing to dace9bf, the daemon crashed again the same way as in my last update:

+19795.642475301 lightningd(5439):DEBUG: Adding block 512160: 0000000000000000001f4c10971f2c8dbefb7594d078f08d8ae3d79fa8823155
+19800.372119800 lightning_gossipd(5447):DEBUG: Connected out for 0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd
+19800.372207823 lightning_gossipd(5447):DEBUG: peer_out WIRE_INIT
+19800.372278631 lightning_gossipd(5447):DEBUG: peer_in WIRE_INIT
+19800.372348859 lightning_gossipd(5447):DEBUG: UPDATE WIRE_GOSSIP_PEER_CONNECTED
+19800.372436702 lightning_gossipd(5447):DEBUG: UPDATE WIRE_GOSSIP_PEER_CONNECTED
+19800.372902555 lightningd(5439):DEBUG: Client: Received message 1 from client
+19800.372919207 lightningd(5439):DEBUG: Client: Received message 9 from client
+19800.598728610 lightningd(5439):DEBUG: sendrawtransaction: 020000000001013e5269a1410afe868da93922f34331ee0b39e84447d31bd385e053359d4cb0300100000000f4bde
e80013e41020000000000160014755bbf1f2f2d8f2b156ff0889abd10d52c91ab5f0400483045022100ded544124dbbb824034e99955c4d077be56cc07067630eb3b776e0d7727f375c02200ea
98bef98cc042f579ee069672e4351b5b1cfe19619defc2d3c79814c74effe0147304402203b7caa18d20c6f2ec4841712fe6893d47c255bbf248b86ac8cda7bae99e7225f022031475265316d8
cd87c9bf4873faf82653725aaf2c80274a205f0a78ac59949320147522102d4da6b235a04f172a8c8c4cb1246300e0d9b8cbba209376e49cb338fb9a2d4ed2103f42fae489b1cf98c031452803
3b0b2ace9ab13c5a8af830d8a8696962f4c6ee552ae8b1c2c20
+19800.617760776 lightning_gossipd(5447):DEBUG: Forgetting remote peer 0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd
+19800.628963992 lightningd(5439):INFO: lightning_channeld-0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd chan #33: Peer connection lo
st
+19800.629188985 lightningd(5439):INFO: 0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd chan #33: Peer transient failure in CHANNELD_NO
RMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
+19800.630829625 lightning_gossipd(5447):DEBUG: req: type WIRE_GOSSIPCTL_REACH_PEER len 35
+19800.632649205 lightningd(5439):BROKEN: FATAL SIGNAL 11 RECEIVED
+19800.648411942 lightningd(5439):BROKEN: backtrace: lightningd/log.c:504 (log_crash) 0x415442
+19800.648483842 lightningd(5439):BROKEN: backtrace: (null):0 ((null)) 0x7f5eee6c24af
+19800.648891817 lightningd(5439):BROKEN: backtrace: lightningd/pay.c:388 (payment_failed) 0x41de70
+19800.649296428 lightningd(5439):BROKEN: backtrace: lightningd/peer_htlcs.c:811 (onchain_failed_our_htlc) 0x4259ec
+19800.649484015 lightningd(5439):BROKEN: backtrace: lightningd/onchain_control.c:202 (handle_onchain_htlc_timeout) 0x417401
+19800.649505513 lightningd(5439):BROKEN: backtrace: lightningd/onchain_control.c:266 (onchain_msg) 0x4176b1
+19800.649774431 lightningd(5439):BROKEN: backtrace: lightningd/subd.c:509 (sd_msg_read) 0x4290c5
+19800.649932062 lightningd(5439):BROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x45fc4b
+19800.649952220 lightningd(5439):BROKEN: backtrace: ccan/ccan/io/io.c:387 (do_plan) 0x460722
+19800.649972991 lightningd(5439):BROKEN: backtrace: ccan/ccan/io/io.c:397 (io_ready) 0x460760
+19800.650120226 lightningd(5439):BROKEN: backtrace: ccan/ccan/io/poll.c:310 (io_loop) 0x4620aa
+19800.650309609 lightningd(5439):BROKEN: backtrace: lightningd/lightningd.c:405 (main) 0x413b46
+19800.650334842 lightningd(5439):BROKEN: backtrace: (null):0 ((null)) 0x7f5eee6ad82f
+19800.650351569 lightningd(5439):BROKEN: backtrace: (null):0 ((null)) 0x4033f8
+19800.650376234 lightningd(5439):BROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented Mar 6, 2018

payment_failed assumes that there was indeed a payment with the same payment_hash as the given struct htlc_out object.

However, payment seems to be either corrupted or NULL at this point:

if (!payment->route_channels) {

We need to determine what payment is here. Can you run under gdb and inform us content of payment at this point when it crashes?

If payment is NULL then it means we have a database inconsistency --- we saved the HTLC but not the payment that triggered the creation of the HTLC.

rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 6, 2018
The root cause of ElementsProject#1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Closes: ElementsProject#1114
Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 6, 2018
…fees.

This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 6, 2018
The root cause of ElementsProject#1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Closes: ElementsProject#1114
Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 6, 2018
…fees.

This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 7, 2018
The root cause of ElementsProject#1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Closes: ElementsProject#1114
Signed-off-by: Rusty Russell <[email protected]>
rustyrussell added a commit to rustyrussell/lightning that referenced this issue Mar 7, 2018
…fees.

This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
cdecker pushed a commit to rustyrussell/lightning that referenced this issue Mar 7, 2018
The root cause of ElementsProject#1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Closes: ElementsProject#1114
Signed-off-by: Rusty Russell <[email protected]>
cdecker pushed a commit to rustyrussell/lightning that referenced this issue Mar 7, 2018
…fees.

This was revealed in ElementsProject#1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
cdecker pushed a commit that referenced this issue Mar 7, 2018
The root cause of #1114 was that the restarted onchaind created a
different proposal to the one which had previously been mined:

	2018-03-01T09:41:08.884Z lightningd(1): lightning_onchaind-020d3d5995a973c878e3f6e5f59da54078304c537f981d7dcef73367ecbea0e90e chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

After the previous patches which fixed the output address difference,
we could identify proposals by their outputs, but during the
transition (onchaind started with old buggy version, restarted now)
that wouldn't be right, so we match the inputs, discarding signatures
which will be different.  This works for all current cases.

Closes: #1114
Signed-off-by: Rusty Russell <[email protected]>
cdecker pushed a commit that referenced this issue Mar 7, 2018
…fees.

This was revealed in #1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).

This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:

	2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3

Signed-off-by: Rusty Russell <[email protected]>
@sadilek
Copy link
Author

sadilek commented Mar 7, 2018

I didn't have time yet to help with further debugging this. But I just synced to f45f18c and am trying again...

@sadilek
Copy link
Author

sadilek commented Mar 8, 2018

Still failing:

+22830.223728664 lightningd(25109):DEBUG: Adding block 512614: 0000000000000000000074578eada69a12eba816631bd71701c90983d27af3b5
+22833.481983676 lightningd(25109):DEBUG: sendrawtransaction: 020000000001013e5269a1410afe868da93922f34331ee0b39e84447d31bd385e053359d4cb0300100000000
f4bdee80013e41020000000000160014755bbf1f2f2d8f2b156ff0889abd10d52c91ab5f0400483045022100ded544124dbbb824034e99955c4d077be56cc07067630eb3b776e0d7727f37
5c02200ea98bef98cc042f579ee069672e4351b5b1cfe19619defc2d3c79814c74effe0147304402203b7caa18d20c6f2ec4841712fe6893d47c255bbf248b86ac8cda7bae99e7225f0220
31475265316d8cd87c9bf4873faf82653725aaf2c80274a205f0a78ac59949320147522102d4da6b235a04f172a8c8c4cb1246300e0d9b8cbba209376e49cb338fb9a2d4ed2103f42fae48
9b1cf98c0314528033b0b2ace9ab13c5a8af830d8a8696962f4c6ee552ae8b1c2c20
+22833.514940459 lightningd(25109):BROKEN: FATAL SIGNAL 11 RECEIVED
+22833.529209282 lightningd(25109):BROKEN: backtrace: lightningd/log.c:504 (log_crash) 0x41560a
+22833.529273288 lightningd(25109):BROKEN: backtrace: (null):0 ((null)) 0x7f36098014af
+22833.529541365 lightningd(25109):BROKEN: backtrace: lightningd/pay.c:389 (payment_failed) 0x41e25f
+22833.529929806 lightningd(25109):BROKEN: backtrace: lightningd/peer_htlcs.c:812 (onchain_failed_our_htlc) 0x425f9a
+22833.530113512 lightningd(25109):BROKEN: backtrace: lightningd/onchain_control.c:203 (handle_onchain_htlc_timeout) 0x4175c9
+22833.530136063 lightningd(25109):BROKEN: backtrace: lightningd/onchain_control.c:269 (onchain_msg) 0x417891
+22833.530412959 lightningd(25109):BROKEN: backtrace: lightningd/subd.c:509 (sd_msg_read) 0x429673
+22833.530608108 lightningd(25109):BROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x461312
+22833.530629232 lightningd(25109):BROKEN: backtrace: ccan/ccan/io/io.c:387 (do_plan) 0x461de9
+22833.530643897 lightningd(25109):BROKEN: backtrace: ccan/ccan/io/io.c:397 (io_ready) 0x461e27
+22833.530771988 lightningd(25109):BROKEN: backtrace: ccan/ccan/io/poll.c:310 (io_loop) 0x463771
+22833.530957608 lightningd(25109):BROKEN: backtrace: lightningd/lightningd.c:405 (main) 0x413d0e
+22833.530977591 lightningd(25109):BROKEN: backtrace: (null):0 ((null)) 0x7f36097ec82f
+22833.530990687 lightningd(25109):BROKEN: backtrace: (null):0 ((null)) 0x4033f8
+22833.531002117 lightningd(25109):BROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff

This is the same way as in my last two updates but different from my original report. Do you want me to file this as a new issue?

@sadilek
Copy link
Author

sadilek commented Mar 22, 2018

I kept pulling the latest changes, and since I'm synced to 35e85ab, my node doesn't crash any longer after having caught-up. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants