-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race between outbound messages and peer disconnection #2663
Fix race between outbound messages and peer disconnection #2663
Conversation
Codecov ReportAll modified lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2663 +/- ##
==========================================
+ Coverage 89.04% 89.05% +0.01%
==========================================
Files 112 112
Lines 87419 87417 -2
Branches 87419 87417 -2
==========================================
+ Hits 77841 77850 +9
+ Misses 7336 7328 -8
+ Partials 2242 2239 -3
☔ View full report in Codecov by Sentry. |
f389626
to
f1e97f9
Compare
lightning/src/ln/peer_handler.rs
Outdated
|
||
#[test] | ||
#[cfg(feature = "std")] | ||
fn test_rapid_connect_events_order_multithreaded() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this about 100x with your change reverted and wasn't able to get it to fail :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, yea, sorry, this test isnt even close to right. I'll spend some more time on this but the issue is pretty hard to repro.
f1e97f9
to
2c33d6d
Compare
For now simply removed the test. I think we should land this as-is because writing a generic stress test that isn't super targeted is not trivial and we need to ship this. |
lightning/src/ln/peer_handler.rs
Outdated
return Err(PeerHandleError { }.into()); | ||
} | ||
if let Err(()) = self.message_handler.custom_message_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection) { | ||
log_debug!(self.logger, "Custom Message Handler decided we couldn't communicate with peer {}", log_pubkey!(their_node_id)); | ||
self.message_handler.onion_message_handler.peer_disconnected(&their_node_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about route_handler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have a peer_disconnected.
lightning-custom-message/src/lib.rs
Outdated
)* | ||
if should_disconnect { | ||
$( | ||
self.$field.peer_disconnected(their_node_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok that peer_disconnected
may be called for some handlers that did not get a peer_connected
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All handlers get a peer_connected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat awkward that we have different semantics on the custom message handler from the other message handlers - the others dont get a peer_disconnected if they return an error from peer_connected, custom does (here). Let me fix that.
Previously, outbound messages held in `process_events` could race with peer disconnection, allowing a message intended for a peer before disconnection to be sent to the same peer after disconnection. The fix is simple - hold the peers read lock while we fetch pending messages from peers (as we disconnect with the write lock).
2c33d6d
to
4366369
Compare
Dropped all the commits but the real fix, since those commits were mostly just for the test that didnt work anyway. I'll get a PR up with the changes and a test at some point in a followup. |
0.0.118 - Oct 23, 2023 - "Just the Twelve Sinks" API Updates =========== * BOLT12 sending and receiving is now supported as an alpha feature. You may run into unexpected issues and will need to have a direct connection with the offer's blinded path introduction points as messages are not yet routed. We are seeking feedback from early testers (lightningdevkit#2578, lightningdevkit#2039). * `ConfirmationTarget` has been rewritten to provide information about the specific use LDK needs the feerate estimate for, rather than the generic low-, medium-, and high-priority estimates. This allows LDK users to more accurately target their feerate estimates (lightningdevkit#2660). For those wishing to retain their existing behavior, see the table below for conversion. * `ChainHash` is now used in place of `BlockHash` where it represents the genesis block (lightningdevkit#2662). * `lightning-invoice` payment utilities now take a `Deref` to `AChannelManager` (lightningdevkit#2652). * `peel_onion` is provided to statelessly decode an `OnionMessage` (lightningdevkit#2599). * `ToSocketAddrs` + `Display` are now impl'd for `SocketAddress` (lightningdevkit#2636, lightningdevkit#2670) * `Display` is now implemented for `OutPoint` (lightningdevkit#2649). * `Features::from_be_bytes` is now provided (lightningdevkit#2640). For those moving to the new `ConfirmationTarget`, the new variants in terms of the old mempool/low/medium/high priorities are as follows: * `OnChainSweep` = `HighPriority` * `MaxAllowedNonAnchorChannelRemoteFee` = `max(25 * 250, HighPriority * 10)` * `MinAllowedAnchorChannelRemoteFee` = `MempoolMinimum` * `MinAllowedNonAnchorChannelRemoteFee` = `Background - 250` * `AnchorChannelFee` = `Background` * `NonAnchorChannelFee` = `Normal` * `ChannelCloseMinimum` = `Background` Bug Fixes ========= * Calling `ChannelManager::close_channel[_with_feerate_and_script]` on a channel which did not exist would immediately hang holding several key `ChannelManager`-internal locks (lightningdevkit#2657). * Channel information updates received from a failing HTLC are no longer applied to our `NetworkGraph`. This prevents a node which we attempted to route a payment through from being able to learn the sender of the payment. In some rare cases, this may result in marginally reduced payment success rates (lightningdevkit#2666). * Anchor outputs are now properly considered when calculating the amount available to send in HTLCs. This can prevent force-closes in anchor channels when sending payments which overflow the available balance (lightningdevkit#2674). * A peer that sends an `update_fulfill_htlc` message for a forwarded HTLC, then reconnects prior to sending a `commitment_signed` (thus retransmitting their `update_fulfill_htlc`) may result in the channel stalling and being unable to make progress (lightningdevkit#2661). * In exceedingly rare circumstances, messages intended to be sent to a peer prior to reconnection can be sent after reconnection. This could result in undefined channel state and force-closes (lightningdevkit#2663). Backwards Compatibility ======================= * Creating a blinded path to receive a payment then downgrading to LDK prior to 0.0.117 may result in failure to receive the payment (lightningdevkit#2413). * Calling `ChannelManager::pay_for_offer` or `ChannelManager::create_refund_builder` may prevent downgrading to LDK prior to 0.0.118 until the payment times out and has been removed (lightningdevkit#2039). Node Compatibility ================== * LDK now sends a bogus `channel_reestablish` message to peers when they ask to resume an unknown channel. This should cause LND nodes to force-close and broadcast the latest channel state to the chain. In order to trigger this when we wish to force-close a channel, LDK now disconnects immediately after sending a channel-closing `error` message. This should result in cooperative peers also working to confirm the latest commitment transaction when we wish to force-close (lightningdevkit#2658). Security ======== 0.0.118 expands mitigations against transaction cycling attacks to non-anchor channels, though note that no mitigations which exist today are considered robust to prevent the class of attacks. * In order to mitigate against transaction cycling attacks, non-anchor HTLC transactions are now properly re-signed before broadcasting (lightningdevkit#2667). In total, this release features 61 files changed, 3470 insertions, 1503 deletions in 85 commits from 12 authors, in alphabetical order: * Antonio Yang * Elias Rohrer * Evan Feenstra * Fedeparma74 * Gursharan Singh * Jeffrey Czyz * Matt Corallo * Sergi Delgado Segura * Vladimir Fomene * Wilmer Paulino * benthecarman * slanesuke
Previously, outbound messages held in
process_events
could racewith peer disconnection, allowing a message intended for a peer
before disconnection to be sent to the same peer after
disconnection.
The fix is simple - hold the peers read lock while we fetch
pending messages from peers (as we disconnect with the write lock).