-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No next lookup via gossipd #3547
No next lookup via gossipd #3547
Conversation
= tal_dup_talarr(cbdata, u8, failmsg_needs_update); | ||
subd_req(cbdata, hin->key.channel->peer->ld->gossip, | ||
take(towire_gossip_get_stripped_cupdate(NULL, failmsg_scid)), | ||
-1, 0, failmsg_update_reply, cbdata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like your gossip
daemon doesnt exist in all cases? failing test_restart_many_payments
EBUG:root:lightningd-1: 2020-02-25T06:08:23.095Z DEBUG 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199-chan#3: HTLC in 1 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: FATAL SIGNAL 11 (version v0.8.1-36-g03045e1-modded)
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: common/daemon.c:44 (send_backtrace) 0x5634d3e4c1b0
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: common/daemon.c:52 (crashdump) 0x5634d3e4c200
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fb9d3ca4f1f
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/subd.c:750 (subd_send_msg) 0x5634d3e43132
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/subd.c:770 (subd_req_) 0x5634d3e431f6
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:268 (local_fail_in_htlc_needs_update) 0x5634d3e35438
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:296 (fail_out_htlc) 0x5634d3e3558d
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:516 (destroy_hout_subd_died) 0x5634d3e35c86
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x5634d3eb2472
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:402 (del_tree) 0x5634d3eb2961
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:486 (tal_free) 0x5634d3eb2ced
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:2107 (free_htlcs) 0x5634d3e3a85a
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:520 (shutdown_subdaemons) 0x5634d3e1a7a1
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:940 (main) 0x5634d3e1b217
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fb9d3c87b96
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x5634d3e01b09
DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff
DEBUG:root:Received response for stop call: {'error': 'Connection to RPC server lost.'}
----------------------------- Captured stderr call -----------------------------
lightningd: FATAL SIGNAL 11 (version v0.8.1-36-g03045e1-modded)
0x5634d3e4c15a send_backtrace
common/daemon.c:39
0x5634d3e4c200 crashdump
common/daemon.c:52
0x7fb9d3ca4f1f ???
???:0
0x5634d3e43132 subd_send_msg
lightningd/subd.c:750
0x5634d3e431f6 subd_req_
lightningd/subd.c:770
0x5634d3e35438 local_fail_in_htlc_needs_update
lightningd/peer_htlcs.c:268
0x5634d3e3558d fail_out_htlc
lightningd/peer_htlcs.c:296
0x5634d3e35c86 destroy_hout_subd_died
lightningd/peer_htlcs.c:516
0x5634d3eb2472 notify
ccan/ccan/tal/tal.c:240
0x5634d3eb2961 del_tree
ccan/ccan/tal/tal.c:402
0x5634d3eb2ced tal_free
ccan/ccan/tal/tal.c:486
0x5634d3e3a85a free_htlcs
lightningd/peer_htlcs.c:2107
0x5634d3e1a7a1 shutdown_subdaemons
lightningd/lightningd.c:520
0x5634d3e1b217 main
lightningd/lightningd.c:940
0x7fb9d3c87b96 ???
???:0
0x5634d3e01b09 ???
???:0
0xffffffffffffffff ???
???:0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think this is best addressed by freeing HTLCs and peer daemons first, then gossipd. Thanks!
} | ||
|
||
if (!topology_synced(out->peer->ld->topology)) { | ||
log_info(out->log, "Attempt to send HTLC but still syncing" | ||
" with bitcoin network"); | ||
return towire_temporary_channel_failure(ctx, | ||
out->stripped_update); | ||
return towire_temporary_node_failure(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this change is causing test_lightningd_still_loading
to fail with the wrong error message?
<class 'AssertionError'>
Pattern 'TEMPORARY_NODE_FAILURE' not found in "RPC call failed: method: sendpay, payload: {'route': [{'msatoshi': 1000, 'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'delay': 5, 'channel': '1x1x1'}], 'payment_hash': '55a61a634cd0dbfdad2461676acae93774a0576ff4e5d61cb6728b3078744af1'}, error: {'code': 204, 'message': 'failed: WIRE_TEMPORARY_CHANNEL_FAILURE (First peer not ready)', 'data': {'erring_index': 0, 'failcode': 4103, 'failcodename': 'WIRE_TEMPORARY_CHANNEL_FAILURE', 'erring_node': '0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518', 'erring_channel': '1x1x1', 'erring_direction': 1}}"
[<TracebackEntry /home/travis/build/ElementsProject/lightning/tests/test_misc.py:189>]
test_lightningd_still_loading failed; it passed 0 out of the required 1 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, no. That error is complaining that it's expecting a temp_node_fail, but it's getting a temp_chan_fail. And it doesn't happen for me here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it does under VALGRIND. Turns out test is flaky, fixing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of test failures; otherwise concept ACK
55b75cf
to
7439faf
Compare
…use it. The idea is that gossipd can give us the cupdate we need for an error, and we wire things up so that we ask for it (async) just before we send the error to the subdaemon. I tried many other things, but they were all too high-risk. 1. We need to ask gossipd every time, since it produces these lazily (in particular, it doesn't actually generate an offline update unless the channel is used). 2. We can't do async calls in random places, since we'll end up with an HTLC in limbo. What if another path tries to fail it at the same time? 3. This allows us to use a temporary_node_failure error, and upgrade it when gossipd replies. This doesn't change any existing assumptions. Signed-off-by: Rusty Russell <[email protected]>
Instead of saving a stripped_update, we use the new local_fail_in_htlc_needs_update. One minor change: we return the more correct towire_temporary_channel_failure when the node is still syncing. Signed-off-by: Rusty Russell <[email protected]>
7439faf
to
46eaef9
Compare
Rebased, addressed @niftynei feedback. |
Even without optimization, it's faster to walk all the channels than ping another daemon and wait for the response. Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic) Signed-off-by: Rusty Russell <[email protected]>
46eaef9
to
b17cd23
Compare
Restarted due to flaky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor clarification, otherwise LGTM
ACK b17cd23
* but it's v. unlikely */ | ||
if (!fromwire_gossip_get_stripped_cupdate_reply(msg, msg, | ||
&stripped_update) | ||
|| !tal_count(stripped_update)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we using tal_bytelen
for u8*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a line-ball, IMHO. Especially here, where it will actually (due to our explicit fromwire_ generation policy) be NULL instead of empty.
|
||
subd_send_msg(hin->key.channel->owner, | ||
take(towire_channel_fail_htlc(NULL, failed_htlc))); | ||
tell_channeld_htlc_failed(hin, failed_htlc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification 👍
(Based on #3546, so ignore first three commits).
Every time we need to forward, we ask gossipd for where we should forward it. That's unnecessary, and slow and weird. But that also gives us a chance to get the latest channel_update, in case we need to send an error.
This does a local lookup, and if we need an update it starts with a temporary_node_failure then asks gossipd so it can make the real error. This is robust against crashing and other corner cases (such as channeld manipulating the failed HTLC) since there's never an HTLC in limbo.