Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race between outbound messages and peer disconnection #2663

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

Previously, outbound messages held in process_events could race
with peer disconnection, allowing a message intended for a peer
before disconnection to be sent to the same peer after
disconnection.

The fix is simple - hold the peers read lock while we fetch
pending messages from peers (as we disconnect with the write lock).

@TheBlueMatt TheBlueMatt added this to the 0.0.118 milestone Oct 13, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 14, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (22a0bfc) 89.04% compared to head (4366369) 89.05%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2663      +/-   ##
==========================================
+ Coverage   89.04%   89.05%   +0.01%     
==========================================
  Files         112      112              
  Lines       87419    87417       -2     
  Branches    87419    87417       -2     
==========================================
+ Hits        77841    77850       +9     
+ Misses       7336     7328       -8     
+ Partials     2242     2239       -3     
Files Coverage Δ
lightning/src/ln/peer_handler.rs 58.68% <100.00%> (+<0.01%) ⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt force-pushed the 2023-10-peer-race-send-discon branch from f389626 to f1e97f9 Compare October 14, 2023 00:51

#[test]
#[cfg(feature = "std")]
fn test_rapid_connect_events_order_multithreaded() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this about 100x with your change reverted and wasn't able to get it to fail :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, yea, sorry, this test isnt even close to right. I'll spend some more time on this but the issue is pretty hard to repro.

@TheBlueMatt TheBlueMatt force-pushed the 2023-10-peer-race-send-discon branch from f1e97f9 to 2c33d6d Compare October 18, 2023 15:22
@TheBlueMatt
Copy link
Collaborator Author

For now simply removed the test. I think we should land this as-is because writing a generic stress test that isn't super targeted is not trivial and we need to ship this.

return Err(PeerHandleError { }.into());
}
if let Err(()) = self.message_handler.custom_message_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection) {
log_debug!(self.logger, "Custom Message Handler decided we couldn't communicate with peer {}", log_pubkey!(their_node_id));
self.message_handler.onion_message_handler.peer_disconnected(&their_node_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about route_handler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have a peer_disconnected.

)*
if should_disconnect {
$(
self.$field.peer_disconnected(their_node_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok that peer_disconnected may be called for some handlers that did not get a peer_connected first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All handlers get a peer_connected here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat awkward that we have different semantics on the custom message handler from the other message handlers - the others dont get a peer_disconnected if they return an error from peer_connected, custom does (here). Let me fix that.

Previously, outbound messages held in `process_events` could race
with peer disconnection, allowing a message intended for a peer
before disconnection to be sent to the same peer after
disconnection.

The fix is simple - hold the peers read lock while we fetch
pending messages from peers (as we disconnect with the write lock).
@TheBlueMatt
Copy link
Collaborator Author

Dropped all the commits but the real fix, since those commits were mostly just for the test that didnt work anyway. I'll get a PR up with the changes and a test at some point in a followup.

@TheBlueMatt TheBlueMatt merged commit 0357caf into lightningdevkit:main Oct 18, 2023
15 checks passed
k0k0ne pushed a commit to bitlightlabs/rust-lightning that referenced this pull request Sep 30, 2024
0.0.118 - Oct 23, 2023 - "Just the Twelve Sinks"

API Updates
===========

 * BOLT12 sending and receiving is now supported as an alpha feature. You may
   run into unexpected issues and will need to have a direct connection with
   the offer's blinded path introduction points as messages are not yet routed.
   We are seeking feedback from early testers (lightningdevkit#2578, lightningdevkit#2039).
 * `ConfirmationTarget` has been rewritten to provide information about the
   specific use LDK needs the feerate estimate for, rather than the generic
   low-, medium-, and high-priority estimates. This allows LDK users to more
   accurately target their feerate estimates (lightningdevkit#2660). For those wishing to
   retain their existing behavior, see the table below for conversion.
 * `ChainHash` is now used in place of `BlockHash` where it represents the
   genesis block (lightningdevkit#2662).
 * `lightning-invoice` payment utilities now take a `Deref` to
   `AChannelManager` (lightningdevkit#2652).
 * `peel_onion` is provided to statelessly decode an `OnionMessage` (lightningdevkit#2599).
 * `ToSocketAddrs` + `Display` are now impl'd for `SocketAddress` (lightningdevkit#2636, lightningdevkit#2670)
 * `Display` is now implemented for `OutPoint` (lightningdevkit#2649).
 * `Features::from_be_bytes` is now provided (lightningdevkit#2640).

For those moving to the new `ConfirmationTarget`, the new variants in terms of
the old mempool/low/medium/high priorities are as follows:
 * `OnChainSweep` = `HighPriority`
 * `MaxAllowedNonAnchorChannelRemoteFee` = `max(25 * 250, HighPriority * 10)`
 * `MinAllowedAnchorChannelRemoteFee` = `MempoolMinimum`
 * `MinAllowedNonAnchorChannelRemoteFee` = `Background - 250`
 * `AnchorChannelFee` = `Background`
 * `NonAnchorChannelFee` = `Normal`
 * `ChannelCloseMinimum` = `Background`

Bug Fixes
=========

 * Calling `ChannelManager::close_channel[_with_feerate_and_script]` on a
   channel which did not exist would immediately hang holding several key
   `ChannelManager`-internal locks (lightningdevkit#2657).
 * Channel information updates received from a failing HTLC are no longer
   applied to our `NetworkGraph`. This prevents a node which we attempted to
   route a payment through from being able to learn the sender of the payment.
   In some rare cases, this may result in marginally reduced payment success
   rates (lightningdevkit#2666).
 * Anchor outputs are now properly considered when calculating the amount
   available to send in HTLCs. This can prevent force-closes in anchor channels
   when sending payments which overflow the available balance (lightningdevkit#2674).
 * A peer that sends an `update_fulfill_htlc` message for a forwarded HTLC,
   then reconnects prior to sending a `commitment_signed` (thus retransmitting
   their `update_fulfill_htlc`) may result in the channel stalling and being
   unable to make progress (lightningdevkit#2661).
 * In exceedingly rare circumstances, messages intended to be sent to a peer
   prior to reconnection can be sent after reconnection. This could result in
   undefined channel state and force-closes (lightningdevkit#2663).

Backwards Compatibility
=======================

 * Creating a blinded path to receive a payment then downgrading to LDK prior to
   0.0.117 may result in failure to receive the payment (lightningdevkit#2413).
 * Calling `ChannelManager::pay_for_offer` or
   `ChannelManager::create_refund_builder` may prevent downgrading to LDK prior
   to 0.0.118 until the payment times out and has been removed (lightningdevkit#2039).

Node Compatibility
==================

 * LDK now sends a bogus `channel_reestablish` message to peers when they ask to
   resume an unknown channel. This should cause LND nodes to force-close and
   broadcast the latest channel state to the chain. In order to trigger this
   when we wish to force-close a channel, LDK now disconnects immediately after
   sending a channel-closing `error` message. This should result in cooperative
   peers also working to confirm the latest commitment transaction when we wish
   to force-close (lightningdevkit#2658).

Security
========

0.0.118 expands mitigations against transaction cycling attacks to non-anchor
channels, though note that no mitigations which exist today are considered robust
to prevent the class of attacks.
 * In order to mitigate against transaction cycling attacks, non-anchor HTLC
   transactions are now properly re-signed before broadcasting (lightningdevkit#2667).

In total, this release features 61 files changed, 3470 insertions, 1503
deletions in 85 commits from 12 authors, in alphabetical order:
 * Antonio Yang
 * Elias Rohrer
 * Evan Feenstra
 * Fedeparma74
 * Gursharan Singh
 * Jeffrey Czyz
 * Matt Corallo
 * Sergi Delgado Segura
 * Vladimir Fomene
 * Wilmer Paulino
 * benthecarman
 * slanesuke
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants