-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework auto-retry send errors #2014
Rework auto-retry send errors #2014
Conversation
@TheBlueMatt I'm wondering if generating |
22698fc
to
8f1a5db
Compare
Codecov ReportBase: 87.20% // Head: 87.89% // Increases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #2014 +/- ##
==========================================
+ Coverage 87.20% 87.89% +0.68%
==========================================
Files 101 101
Lines 44175 49323 +5148
Branches 44175 49323 +5148
==========================================
+ Hits 38525 43351 +4826
- Misses 5650 5972 +322
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
8f1a5db
to
c331f58
Compare
9946121
to
6513398
Compare
lightning/src/util/events.rs
Outdated
(5, path, vec_type), | ||
(7, short_channel_id, option), | ||
(9, retry, option), | ||
(11, payment_id, option), | ||
(13, failure_opt, ignorable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the ignorable field imply that if its missing we'll return None
for the full event if its missing? Also a bug for the network_update
field, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expanded macro code looks good to me. I think we'll only return None
for the full event if the MaybeReadable::read
returns Ok(None)
, which only happens if the TLV type for the field is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hmm, I was thinking of the ignorable
case in _init_tlv_based_struct_field
. This is confusing - we handle ignorable
in the higher-level macros as "implement MaybeReadable
and return Ok(None)
" but in the lower-level macros as just a regular ol' option
. Can we make it consistent - use the new ignorable_option
as a regular ol' option but make the ignorable
be a required
field in the lower-level macros as well, returning Ok(None)
in _check_missing_tlv
(which implies we're in a MaybeReadable
, and maybe renaming the ignorable
to ignorable_no_read
or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe renaming the ignorable to ignorable_no_read or whatever?
That all sounds good, though I'd also be in favor of renaming ignorable_*
to upgradable_*
. Because I'd like to name the above variant ignorable_required
, but that sounds like an oxymoron.
lightning/src/util/events.rs
Outdated
pub enum Failure { | ||
/// We failed to initially send the payment and no HTLC was committed to. Contains a relevant | ||
/// error message. | ||
InitialSend(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in ~every case we have exactly one APIError
to put here, why not that instead of a String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't implement writeable
at the moment, though that wouldn't be too hard to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, let's just do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I think we'll need to change the APIError::InvalidRoute
to contain a String
. I don't think we can implement Readable
for a &'static str
because it references data owned by the read
function.
Given where other things are it seems we have time? Its not exactly critical, but if I'm tracking failures/successes I'd like to see a failure for every |
Yeah agreed, that comment was out of date sorry |
Useful for generating Payment(Path)Failed events in this method
Useful for generating Payment(Path)Failed events in this method
1f272c0
to
8492847
Compare
Tacked on a small commit to fix #1969. This PR ended up taking on some unrelated changes, so I'd be happy to split it up if reviewers prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, let's definitely try to split this up, maybe everything after Fix outdated PendingOutboundPayment::Abandoned docs
can go into separate (probably two?) PRs?
lightning/src/util/events.rs
Outdated
@@ -1199,27 +1225,27 @@ impl MaybeReadable for Event { | |||
let mut payment_hash = PaymentHash([0; 32]); | |||
let mut payment_failed_permanently = false; | |||
let mut network_update = None; | |||
let mut all_paths_failed = Some(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the backwards compat approach when this was added was that we we didnt support MPP at all so we could assume this was always true. For forwards compat, though, this doesn't make sense - if someone downgrades to 113 from 114 and looks at this field it will always report that all HTLCs have failed, which is obviously nonsense. Its been long enough since we added the abandon flow that we can break forwards compat here, I think, but its a bit safer to write Some(false)
rather than letting old clients assume true
by writing None
. We also need to add a draft release note here that says that the field has been removed and old readers will always see false
, and users wishing to support downgrade who still rely on the field should instead first migrate to the abandon-based approach and look at PaymentFailed
events and then migrate to 114.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this in #2043
Edit: addressed
107e6e1
to
2b14430
Compare
Split up the PR, see #2043 |
2b14430
to
ed1ef50
Compare
lightning/src/ln/outbound_payment.rs
Outdated
self.retry_payment_internal(payment_id, route_params, router, first_hops, inflight_htlcs, entropy_source, node_signer, best_block_height, logger, pending_events, send_payment_along_path); | ||
}, | ||
PaymentSendFailure::PartialFailure { failed_paths_retry: Some(retry), results, .. } => { | ||
Self::push_payment_path_failed_evs(payment_id, payment_hash, route.paths, results.into_iter(), pending_events); | ||
Self::push_payment_path_failed_evs(payment_id, payment_hash, None, route.paths, results.into_iter(), pending_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could push the failed scids here, but they won't be used because we just use the PartialFailure::failed_paths_retry
for resend, which already contains them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't the event that we surface also reflect this? Why would we exclude the scid
here but include it for the AllFailedResendSafe
case?
More generally, I guess it's hard to follow where previously_failed_channels
is populated. In this function, it is a side effect of calling push_payment_path_failed_evs
, but only for AllFailedResendSafe
. Then when looking at fail_htlc
for a non-immediate failure, it looks to be added twice. Once when calling insert_previously_failed_scid
and another time when determining the failure was not a probe. Is that expected or am I misreading that? Maybe the second insert is necessary for the backwards compatibility case? Just thinking if there s there any way we could have it added in a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when looking at fail_htlc for a non-immediate failure, it looks to be added twice. Once when calling insert_previously_failed_scid and another time when determining the failure was not a probe.
So the second add will go away when we get rid of PaymentPathFailed::retry
, which as you pointed out should be removed.
But shouldn't the event that we surface also reflect this? Why would we exclude the scid here but include it for the AllFailedResendSafe case?
We never include failed scids in the event surfaced in handle_pay_route_err
, PaymentPathFailed::retry
is always None
, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
On a related note, on second glance my statement that we push failed scids into failed_paths_retry
on PartialFailure
was not correct. I think I thought that because the issue says it: #1969 but it isn't in the code. Fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there was also a rebase error, push_payment_path_failed_evs
should be renamed to push_path_failed_evs_and_scids
. Will also fix this
Edit: fixed this and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never include failed scids in the event surfaced in
handle_pay_route_err
,PaymentPathFailed::retry
is alwaysNone
, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
Ah, right, I missed that. Was thinking it was added from the parameter.
/// If a payment fails to send with [`ChannelManager::send_payment`], it can be in one of several | ||
/// states. This enum is returned as the Err() type describing which state the payment is in, see | ||
/// the description of individual enum states for more. | ||
/// | ||
/// [`ChannelManager::send_payment`]: crate::ln::channelmanager::ChannelManager::send_payment | ||
#[derive(Clone, Debug)] | ||
pub enum PaymentSendFailure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to deprecate ChannelManager::send_payment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we've fully decided since it doesn't hurt, but it would definitely clean up this module a bit and the manual route case is already supported by the Router
trait anyway.
ed1ef50
to
7698078
Compare
lightning/src/ln/outbound_payment.rs
Outdated
self.retry_payment_internal(payment_id, route_params, router, first_hops, inflight_htlcs, entropy_source, node_signer, best_block_height, logger, pending_events, send_payment_along_path); | ||
}, | ||
PaymentSendFailure::PartialFailure { failed_paths_retry: Some(retry), results, .. } => { | ||
Self::push_payment_path_failed_evs(payment_id, payment_hash, route.paths, results.into_iter(), pending_events); | ||
Self::push_payment_path_failed_evs(payment_id, payment_hash, None, route.paths, results.into_iter(), pending_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never include failed scids in the event surfaced in
handle_pay_route_err
,PaymentPathFailed::retry
is alwaysNone
, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
Ah, right, I missed that. Was thinking it was added from the parameter.
fea92d5
to
d086918
Compare
Feel free to squash, IMO. |
Same! LGTM |
InvalidRoute is reserved for malformed routes, not routes where a channel or its peer is unavailable
d086918
to
dc05c6d
Compare
Squashed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely looks good minus a couple things that I noticed.
&self, payment_id: PaymentId, payment_hash: PaymentHash, payment_secret: &Option<PaymentSecret>, | ||
keysend_preimage: Option<PaymentPreimage>, retry_strategy: Retry, route_params: RouteParameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters are a little inconsistent here. Should we pass payment_secret
by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's preexisting and I think it would be better to fix it all in one go, since ChannelManager::send_payment
and friends also need this fix
dc05c6d
to
6b4ce57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to the best of my ability with limited understanding of ldk/rust.
Nice refinement of payment states for retries!
return Err(PaymentSendFailure::ParameterError(APIError::APIMisuseError { | ||
err: format!("Invoice expired for payment id {}", log_bytes!(payment_id.0)), | ||
})) | ||
return Err(RetryableSendFailure::PaymentExpired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check means that the invoice is expired, does it make more sense to rename PaymentExpired
-> InvoiceExpired
? Assuming that there's also a sender-set time limit on how long we attempt a payment for, I'd think that time running out is a payment expiry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaymentExpired
is definitely confusing since we also have Retry::Timeout
which is a different thing. Since the PaymentParameters::expiry_time
technically doesn't need to be from an invoice (it could even be set for spontaneous payments), I ended up clarifying the docs on this error variant and on Retry::Timeout
Some(&first_hops.iter().collect::<Vec<_>>()), &inflight_htlcs() | ||
).map_err(|_| RetryableSendFailure::RouteNotFound)?; | ||
|
||
let onion_session_privs = self.add_new_pending_payment(payment_hash, *payment_secret, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If add_new_pending_payment
errors are always mapped to DuplicatePayment
, can it be updated to return a RetryableSendFailure
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it isn't always mapped to that, but yeah that would be nicer.
Prior to this, we returned PaymentSendFailure from auto retry send payment methods. This implied that we might return a PartialFailure from them, which has never been the case. So it makes sense to rework the errors to be a better fit for the methods. We're taking error handling in a totally different direction now to make it more asynchronous, see send_payment_internal for more information.
Since it's only used one place now
Previously, we could have tried the same failed channels over and over until retries are exhausted.
a6e9123
6b4ce57
to
a6e9123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix in a followup.
debug_assert_eq!(paths.len(), path_results.len()); | ||
for (path, path_res) in paths.into_iter().zip(path_results) { | ||
if let Err(e) = path_res { | ||
let failed_scid = if let APIError::InvalidRoute { .. } = e { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to not push a failure event if the error is APIError::MonitorUpdateInProgress
.
0.0.114 - Mar 3, 2023 - "Faster Async BOLT12 Retries" API Updates =========== * `InvoicePayer` has been removed and its features moved directly into `ChannelManager`. As such it now requires a simplified `Router` and supports `send_payment_with_retry` (and friends). `ChannelManager::retry_payment` was removed in favor of the automated retries. Invoice payment utilities in `lightning-invoice` now call the new code (lightningdevkit#1812, lightningdevkit#1916, lightningdevkit#1929, lightningdevkit#2007, etc). * `Sign`/`BaseSign` has been renamed `ChannelSigner`, with `EcdsaChannelSigner` split out in anticipation of future schnorr/taproot support (lightningdevkit#1967). * The catch-all `KeysInterface` was split into `EntropySource`, `NodeSigner`, and `SignerProvider`. `KeysManager` implements all three (lightningdevkit#1910, lightningdevkit#1930). * `KeysInterface::get_node_secret` is now `KeysManager::get_node_secret_key` and is no longer required for external signers (lightningdevkit#1951, lightningdevkit#2070). * A `lightning-transaction-sync` crate has been added which implements keeping LDK in sync with the chain via an esplora server (lightningdevkit#1870). Note that it can only be used on nodes that *never* ran a previous version of LDK. * `Score` is updated in `BackgroundProcessor` instead of via `Router` (lightningdevkit#1996). * `ChainAccess::get_utxo` (now `UtxoAccess`) can now be resolved async (lightningdevkit#1980). * BOLT12 `Offer`, `InvoiceRequest`, `Invoice` and `Refund` structs as well as associated builders have been added. Such invoices cannot yet be paid due to missing support for blinded path payments (lightningdevkit#1927, lightningdevkit#1908, lightningdevkit#1926). * A `lightning-custom-message` crate has been added to make combining multiple custom messages into one enum/handler easier (lightningdevkit#1832). * `Event::PaymentPathFailure` is now generated for failure to send an HTLC over the first hop on our local channel (lightningdevkit#2014, lightningdevkit#2043). * `lightning-net-tokio` no longer requires an `Arc` on `PeerManager` (lightningdevkit#1968). * `ChannelManager::list_recent_payments` was added (lightningdevkit#1873). * `lightning-background-processor` `std` is now optional in async mode (lightningdevkit#1962). * `create_phantom_invoice` can now be used in `no-std` (lightningdevkit#1985). * The required final CLTV delta on inbound payments is now configurable (lightningdevkit#1878) * bitcoind RPC error code and message are now surfaced in `block-sync` (lightningdevkit#2057). * Get `historical_estimated_channel_liquidity_probabilities` was added (lightningdevkit#1961). * `ChannelManager::fail_htlc_backwards_with_reason` was added (lightningdevkit#1948). * Macros which implement serialization using TLVs or straight writing of struct fields are now public (lightningdevkit#1823, lightningdevkit#1976, lightningdevkit#1977). Backwards Compatibility ======================= * Any inbound payments with a custom final CLTV delta will be rejected by LDK if you downgrade prior to receipt (lightningdevkit#1878). * `Event::PaymentPathFailed::network_update` will always be `None` if an 0.0.114-generated event is read by a prior version of LDK (lightningdevkit#2043). * `Event::PaymentPathFailed::all_paths_removed` will always be false if an 0.0.114-generated event is read by a prior version of LDK. Users who rely on it to determine payment retries should migrate to `Event::PaymentFailed`, in a separate release prior to upgrading to LDK 0.0.114 if downgrading is supported (lightningdevkit#2043). Performance Improvements ======================== * Channel data is now stored per-peer and channel updates across multiple peers can be operated on simultaneously (lightningdevkit#1507). * Routefinding is roughly 1.5x faster (lightningdevkit#1799). * Deserializing a `NetworkGraph` is roughly 6x faster (lightningdevkit#2016). * Memory usage for a `NetworkGraph` has been reduced substantially (lightningdevkit#2040). * `KeysInterface::get_secure_random_bytes` is roughly 200x faster (lightningdevkit#1974). Bug Fixes ========= * Fixed a bug where a delay in processing a `PaymentSent` event longer than the time taken to persist a `ChannelMonitor` update, when occurring immediately prior to a crash, may result in the `PaymentSent` event being lost (lightningdevkit#2048). * Fixed spurious rejections of rapid gossip sync data when the graph has been updated by other means between gossip syncs (lightningdevkit#2046). * Fixed a panic in `KeysManager` when the high bit of `starting_time_nanos` is set (lightningdevkit#1935). * Resolved an issue where the `ChannelManager::get_persistable_update_future` future would fail to wake until a second notification occurs (lightningdevkit#2064). * Resolved a memory leak when using `ChannelManager::send_probe` (lightningdevkit#2037). * Fixed a deadlock on some platforms at least when using async `ChannelMonitor` updating (lightningdevkit#2006). * Removed debug-only assertions which were reachable in threaded code (lightningdevkit#1964). * In some cases when payment sending fails on our local channel retries no longer take the same path and thus never succeed (lightningdevkit#2014). * Retries for spontaneous payments have been fixed (lightningdevkit#2002). * Return an `Err` if `lightning-persister` fails to read the directory listing rather than panicing (lightningdevkit#1943). * `peer_disconnected` will now never be called without `peer_connected` (lightningdevkit#2035) Security ======== 0.0.114 fixes several denial-of-service vulnerabilities which are reachable from untrusted input from channel counterparties or in deployments accepting inbound connections or channels. It also fixes a denial-of-service vulnerability in rare cases in the route finding logic. * The number of pending un-funded channels as well as peers without funded channels is now limited to avoid denial of service (lightningdevkit#1988). * A second `channel_ready` message received immediately after the first could lead to a spurious panic (lightningdevkit#2071). This issue was introduced with 0conf support in LDK 0.0.107. * A division-by-zero issue was fixed in the `ProbabilisticScorer` if the amount being sent (including previous-hop fees) is equal to a channel's capacity while walking the graph (lightningdevkit#2072). The division-by-zero was introduced with historical data tracking in LDK 0.0.112. In total, this release features 130 files changed, 21457 insertions, 10113 deletions in 343 commits from 18 authors, in alphabetical order: * Alec Chen * Allan Douglas R. de Oliveira * Andrei * Arik Sosman * Daniel Granhão * Duncan Dean * Elias Rohrer * Jeffrey Czyz * John Cantrell * Kurtsley * Matt Corallo * Max Fang * Omer Yacine * Valentine Wallace * Viktor Tigerström * Wilmer Paulino * benthecarman * jurvis
Partially addresses #1932. Fixes #1969.
Previously we would return a
PaymentSendFailure
from auto-retry send_payment methods. This doesn't make sense because they will never return thePartialFailure
variant, so we add a new error enum for them here.We now also generate
PaymentPathFailed
events for paths that fail on the first hop before they can commit to the htlc (eg if we tried to send more than we have in the channel).Based on #2008