Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't remove nodes if there's no channel_update for a temp failure #2220

Merged
merged 3 commits into from
Apr 24, 2023

Conversation

TheBlueMatt
Copy link
Collaborator

Previously, we were requiring any UPDATE onion errors to include a channel_update, as the spec mandates[1]. If we see an onion error which is missing one we treat it as a misbehaving node that isn't behaving according to the spec and simply remove the node.

Sadly, it appears at least some versions of CLN are such nodes, and opt to not include channel_update at all if they're returning a temporary_channel_failure. This causes us to completely remove CLN nodes from our graph after they fail to forward our HTLC.

While CLN is violating the spec here, there's not a lot of reason to not allow it, so we go ahead and do so here, treating it simply as any other failure by letting the scorer handle it.

[1] The spec says Please note that the channel_update field is mandatory in messages whose failure_code includes the UPDATE flag however doesn't repeat it in the requirements section so its not crazy that someone missed it when implementing.

@TheBlueMatt TheBlueMatt added this to the 0.0.115 milestone Apr 23, 2023
@TheBlueMatt
Copy link
Collaborator Author

@codecov-commenter
Copy link

codecov-commenter commented Apr 23, 2023

Codecov Report

Patch coverage: 84.00% and project coverage change: -0.02 ⚠️

Comparison is base (bc54441) 91.57% compared to head (95ec48a) 91.56%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2220      +/-   ##
==========================================
- Coverage   91.57%   91.56%   -0.02%     
==========================================
  Files         104      104              
  Lines       51553    51559       +6     
  Branches    51553    51559       +6     
==========================================
- Hits        47212    47210       -2     
- Misses       4341     4349       +8     
Impacted Files Coverage Δ
lightning/src/ln/onion_utils.rs 90.82% <81.81%> (-0.77%) ⬇️
lightning/src/ln/monitor_tests.rs 97.86% <100.00%> (-0.31%) ⬇️
lightning/src/routing/gossip.rs 89.93% <100.00%> (+0.03%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt TheBlueMatt mentioned this pull request Apr 24, 2023
wpaulino
wpaulino previously approved these changes Apr 24, 2023
lightning/src/ln/onion_utils.rs Outdated Show resolved Hide resolved
lightning/src/routing/gossip.rs Outdated Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Went ahead and squashed the fixups:

$ git diff-tree -U1 06ceacffe998b103dff4bf2fd478d778562cb8c2 a22227bf1ad9a46403eb9b751630aa6c74a2fc49
diff --git a/lightning/src/ln/onion_utils.rs b/lightning/src/ln/onion_utils.rs
index ac76a88ff..54b6ecdee 100644
--- a/lightning/src/ln/onion_utils.rs
+++ b/lightning/src/ln/onion_utils.rs
@@ -547,3 +547,3 @@ pub(super) fn process_onion_failure<T: secp256k1::Signing, L: Deref>(secp_ctx: &
 										// If the channel_update had a non-zero length (i.e. was
-										// present) but we coulnd't read it, treat it as a total
+										// present) but we couldn't read it, treat it as a total
 										// node failure.
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 7e0788cb8..cc256b167 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -214,3 +214,3 @@ pub enum NetworkUpdate {
 	/// An error indicating that a channel failed to route a payment, which should be applied via
-	/// [`NetworkGraph::channel_failed`].
+	/// [`NetworkGraph::channel_failed_permanent`] if permanent.
 	ChannelFailure {
@@ -354,5 +354,6 @@ impl<L: Deref> NetworkGraph<L> where L::Target: Logger {
 			NetworkUpdate::ChannelFailure { short_channel_id, is_permanent } => {
-				let action = if is_permanent { "Removing" } else { "Not touching" };
-				log_debug!(self.logger, "{} channel graph entry for {} due to a payment failure.", action, short_channel_id);
-				self.channel_failed(short_channel_id, is_permanent);
+				if is_permanent {
+					log_debug!(self.logger, "Removing channel graph entry for {} due to a payment failure.", short_channel_id);
+					self.channel_failed_permanent(short_channel_id);
+				}
 			},
@@ -1634,10 +1635,6 @@ impl<L: Deref> NetworkGraph<L> where L::Target: Logger {
 
-	/// Marks a channel in the graph as failed if a corresponding HTLC fail was sent.
-	///
-	/// If permanent, removes a channel from the local storage.
-	/// May cause the removal of nodes too, if this was their last channel.
+	/// Marks a channel in the graph as failed permanently.
 	///
-	/// If not permanent, no action is taken as such a failure likely indicates the node simply
-	/// lacked liquidity and your scorer should handle this instead.
-	pub fn channel_failed(&self, short_channel_id: u64, is_permanent: bool) {
+	/// The channel and any node for which this was their last channel are removed from the graph.
+	pub fn channel_failed_permanent(&self, short_channel_id: u64) {
 		#[cfg(feature = "std")]
@@ -1647,20 +1644,14 @@ impl<L: Deref> NetworkGraph<L> where L::Target: Logger {
 
-		self.channel_failed_with_time(short_channel_id, is_permanent, current_time_unix)
+		self.channel_failed_permanent_with_time(short_channel_id, current_time_unix)
 	}
 
-	/// Marks a channel in the graph as failed if a corresponding HTLC fail was sent.
+	/// Marks a channel in the graph as failed permanently.
 	///
-	/// If permanent, removes a channel from the local storage.
-	/// May cause the removal of nodes too, if this was their last channel.
-	///
-	/// If not permanent, no action is taken as such a failure likely indicates the node simply
-	/// lacked liquidity and your scorer should handle this instead.
-	fn channel_failed_with_time(&self, short_channel_id: u64, is_permanent: bool, current_time_unix: Option<u64>) {
+	/// The channel and any node for which this was their last channel are removed from the graph.
+	fn channel_failed_permanent_with_time(&self, short_channel_id: u64, current_time_unix: Option<u64>) {
 		let mut channels = self.channels.write().unwrap();
-		if is_permanent {
-			if let Some(chan) = channels.remove(&short_channel_id) {
-				let mut nodes = self.nodes.write().unwrap();
-				self.removed_channels.lock().unwrap().insert(short_channel_id, current_time_unix);
-				Self::remove_channel_in_nodes(&mut nodes, &chan, short_channel_id);
-			}
+		if let Some(chan) = channels.remove(&short_channel_id) {
+			let mut nodes = self.nodes.write().unwrap();
+			self.removed_channels.lock().unwrap().insert(short_channel_id, current_time_unix);
+			Self::remove_channel_in_nodes(&mut nodes, &chan, short_channel_id);
 		}
@@ -2600,3 +2591,3 @@ pub(crate) mod tests {
 			// and all of the entries will be tracked as removed.
-			network_graph.channel_failed_with_time(short_channel_id, true, Some(tracking_time));
+			network_graph.channel_failed_permanent_with_time(short_channel_id, Some(tracking_time));
 

@valentinewallace
Copy link
Contributor

There's a stray call to the previous method name in fuzzing

Previously, we were requiring any `UPDATE` onion errors to include
a `channel_update`, as the spec mandates[1]. If we see an onion
error which is missing one we treat it as a misbehaving node that
isn't behaving according to the spec and simply remove the node.

Sadly, it appears at least some versions of CLN are such nodes, and
opt to not include `channel_update` at all if they're returning a
`temporary_channel_failure`. This causes us to completely remove
CLN nodes from our graph after they fail to forward our HTLC.

While CLN is violating the spec here, there's not a lot of reason
to not allow it, so we go ahead and do so here, treating it simply
as any other failure by letting the scorer handle it.

[1] The spec says `Please note that the channel_update field is
mandatory in messages whose failure_code includes the UPDATE flag`
however doesn't repeat it in the requirements section so its not
crazy that someone missed it when implementing.
@TheBlueMatt
Copy link
Collaborator Author

Fixed two stray refs:

$ git diff-tree -U1 a22227bf 67ad6c40f
diff --git a/fuzz/src/router.rs b/fuzz/src/router.rs
index 568dcdf02..fe6f1647f 100644
--- a/fuzz/src/router.rs
+++ b/fuzz/src/router.rs
@@ -229,3 +229,3 @@ pub fn do_test<Out: test_logger::Output>(data: &[u8], out: Out) {
 				let short_channel_id = slice_to_be64(get_slice!(8));
-				net_graph.channel_failed(short_channel_id, false);
+				net_graph.channel_failed_permanent(short_channel_id);
 			},
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index cc256b167..e5f5e63c9 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -2624,3 +2624,3 @@ pub(crate) mod tests {
 			// and all of the entries will be tracked as removed.
-			network_graph.channel_failed(short_channel_id, true);
+			network_graph.channel_failed_permanent(short_channel_id);
 

@TheBlueMatt TheBlueMatt merged commit c89fd38 into lightningdevkit:main Apr 24, 2023
k0k0ne pushed a commit to bitlightlabs/rust-lightning that referenced this pull request Sep 30, 2024
0.0.115 - Apr 24, 2023 - "Rebroadcast the Bugfixes"

API Updates
===========

 * The MSRV of the main LDK crates has been increased to 1.48 (lightningdevkit#2107).
 * Attempting to claim an un-expired payment on a channel which has closed no
   longer fails. The expiry time of payments is exposed via
   `PaymentClaimable::claim_deadline` (lightningdevkit#2148).
 * `payment_metadata` is now supported in `Invoice` deserialization, sending,
   and receiving (via a new `RecipientOnionFields` struct) (lightningdevkit#2139, lightningdevkit#2127).
 * `Event::PaymentFailed` now exposes a failure reason (lightningdevkit#2142).
 * BOLT12 messages now support stateless generation and validation (lightningdevkit#1989).
 * The `NetworkGraph` is now pruned of stale data after RGS processing (lightningdevkit#2161).
 * Max inbound HTLCs in-flight can be changed in the handshake config (lightningdevkit#2138).
 * `lightning-transaction-sync` feature `esplora-async-https` was added (lightningdevkit#2085).
 * A `ChannelPending` event is now emitted after the initial handshake (lightningdevkit#2098).
 * `PaymentForwarded::outbound_amount_forwarded_msat` was added (lightningdevkit#2136).
 * `ChannelManager::list_channels_by_counterparty` was added (lightningdevkit#2079).
 * `ChannelDetails::feerate_sat_per_1000_weight` was added (lightningdevkit#2094).
 * `Invoice::fallback_addresses` was added to fetch `bitcoin` types (lightningdevkit#2023).
 * The offer/refund description is now exposed in `Invoice{,Request}` (lightningdevkit#2206).

Backwards Compatibility
=======================

 * Payments sent with the legacy `*_with_route` methods on LDK 0.0.115+ will no
   longer be retryable via the LDK 0.0.114- `retry_payment` method (lightningdevkit#2139).
 * `Event::PaymentPathFailed::retry` was removed and will always be `None` for
    payments initiated on 0.0.115 which fail on an earlier version (lightningdevkit#2063).
 * `Route`s and `PaymentParameters` with blinded path information will not be
   readable on prior versions of LDK. Such objects are not currently constructed
   by LDK, but may be when processing BOLT12 data in a coming release (lightningdevkit#2146).
 * Providing `ChannelMonitorUpdate`s generated by LDK 0.0.115 to a
   `ChannelMonitor` on 0.0.114 or before may panic (lightningdevkit#2059). Note that this is
   in general unsupported, and included here only for completeness.

Bug Fixes
=========

 * Fixed a case where `process_events_async` may `poll` a `Future` which has
   already completed (lightningdevkit#2081).
 * Fixed deserialization of `u16` arrays. This bug may have previously corrupted
   the historical buckets in a `ProbabilisticScorer`. Users relying on the
   historical buckets may wish to wipe their scorer on upgrade to remove corrupt
   data rather than waiting on it to decay (lightningdevkit#2191).
 * The `process_events_async` task is now `Send` and can thus be polled on a
   multi-threaded runtime (lightningdevkit#2199).
 * Fixed a missing macro export causing
   `impl_writeable_tlv_based_enum{,_upgradable}` calls to not compile (lightningdevkit#2091).
 * Fixed compilation of `lightning-invoice` with both `no-std` and serde (lightningdevkit#2187)
 * Fix an issue where the `background-processor` would not wake when a
   `ChannelMonitorUpdate` completed asynchronously, causing delays (lightningdevkit#2090).
 * Fix an issue where `process_events_async` would exit immediately (lightningdevkit#2145).
 * `Router` calls from the `ChannelManager` now call `find_route_with_id` rather
   than `find_route`, as was intended and described in the API (lightningdevkit#2092).
 * Ensure `process_events_async` always exits if any sleep future returns true,
   not just if all sleep futures repeatedly return true (lightningdevkit#2145).
 * `channel_update` messages no longer set the disable bit unless the peer has
   been disconnected for some time. This should resolve cases where channels are
   disabled for extended periods of time (lightningdevkit#2198).
 * We no longer remove CLN nodes from the network graph for violating the BOLT
   spec in some cases after failing to pay through them (lightningdevkit#2220).
 * Fixed a debug assertion which may panic under heavy load (lightningdevkit#2172).
 * `CounterpartyForceClosed::peer_msg` is now wrapped in UntrustedString (lightningdevkit#2114)
 * Fixed a potential deadlock in `funding_transaction_generated` (lightningdevkit#2158).

Security
========

 * Transaction re-broadcasting is now substantially more aggressive, including a
   new regular rebroadcast feature called on a timer from the
   `background-processor` or from `ChainMonitor::rebroadcast_pending_claims`.
   This should substantially increase transaction confirmation reliability
   without relying on downstream `TransactionBroadcaster` implementations for
   rebroadcasting (lightningdevkit#2203, lightningdevkit#2205, lightningdevkit#2208).
 * Implemented the changes from BOLT PRs lightningdevkit#1031, lightningdevkit#1032, and lightningdevkit#1040 which resolve a
   privacy vulnerability which allows an intermediate node on the path to
   discover the final destination for a payment (lightningdevkit#2062).

In total, this release features 110 files changed, 11928 insertions, 6368
deletions in 215 commits from 21 authors, in alphabetical order:
 * Advait
 * Alan Cohen
 * Alec Chen
 * Allan Douglas R. de Oliveira
 * Arik Sosman
 * Elias Rohrer
 * Evan Feenstra
 * Jeffrey Czyz
 * John Cantrell
 * Lucas Soriano del Pino
 * Marc Tyndel
 * Matt Corallo
 * Paul Miller
 * Steven
 * Steven Williamson
 * Steven Zhao
 * Tony Giorgio
 * Valentine Wallace
 * Wilmer Paulino
 * benthecarman
 * munjesi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants