Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow async events processing without holding total_consistency_lock #2199

Merged

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Apr 18, 2023

Fixes #2003.

Unfortunately, the RAII types used by RwLock are not Send, which is why they can't be held over await boundaries. In order to allow asynchronous events processing in multi-threaded environments, we here allow to process events without holding the total_consistency_lock. We do so by cloning the events and only draining and persisting the queue after they have successfully been processed.

The first commit reverts a prior commit of #2177, as we now want the behavior of the two process_event methods to diverge, i.e., want to avoid cloning in the sync case.

I tried to be minimally invasive as the event processing will receive a general overhaul with #2167 and follow-ups and any more substantial changes would likely only make sense after they have landed.

@tnull
Copy link
Contributor Author

tnull commented Apr 18, 2023

Currently fails due to a previously-silent panic in BP tests that, due to the behavior of the tokio runtime, wasn't surfaced and caught before. Looking into that.

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Patch coverage: 94.11% and project coverage change: +1.04 🎉

Comparison is base (2ebbe6f) 91.34% compared to head (a5358d0) 92.38%.

❗ Current head a5358d0 differs from pull request most recent head f2453b7. Consider uploading reports for the commit f2453b7 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2199      +/-   ##
==========================================
+ Coverage   91.34%   92.38%   +1.04%     
==========================================
  Files         102      104       +2     
  Lines       50470    61358   +10888     
  Branches    50470    61358   +10888     
==========================================
+ Hits        46103    56688   +10585     
- Misses       4367     4670     +303     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 91.65% <75.00%> (+2.48%) ⬆️
lightning-background-processor/src/lib.rs 83.51% <100.00%> (+6.40%) ⬆️
lightning-net-tokio/src/lib.rs 78.41% <100.00%> (ø)

... and 64 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch 3 times, most recently from 9d6077b to 5467f97 Compare April 18, 2023 14:36
@tnull
Copy link
Contributor Author

tnull commented Apr 18, 2023

Currently fails due to a previously-silent panic in BP tests that, due to the behavior of the tokio runtime, wasn't surfaced and caught before. Looking into that.

Correction: after fixing the CI script, this should now really fail until we fix the bug..

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@TheBlueMatt TheBlueMatt added this to the 0.0.115 milestone Apr 18, 2023
Just two trivial compiler warnings that are unrelated to the changes
made here.
Currently the BP `futures` tests rely on `std`. In order to actually
have them run, we should enable `std`, i.e., remove
`--no-default-features`.
@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from 5467f97 to c9cfd20 Compare April 19, 2023 09:13
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can you not break out the macro? Not because it's wrong here but because there's a lot more complexity coming in a followup PR in there and we'll just have to add it again.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from c9cfd20 to dd48d55 Compare April 20, 2023 10:43
@tnull
Copy link
Contributor Author

tnull commented Apr 20, 2023

Sorry can you not break out the macro? Not because it's wrong here but because there's a lot more complexity coming in a followup PR in there and we'll just have to add it again.

Alright, dropped the revert commit and now also cloning in the sync case.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from dd48d55 to d7de357 Compare April 20, 2023 10:46
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
let mut pending_events = $self.pending_events.lock().unwrap();
pending_events.drain(..num_events);
processed_all_events = pending_events.is_empty();
$self.pending_events_processor.store(false, Ordering::Release);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only happen if !processed_all_events? Not a big deal either way, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if we processed all events? Yeah, I think I'd leave it as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, I mean literally just move the setter here into a check for if we're about to go around again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, had understood as much, but we def. need to reset in the case we leave the method. We could have moved the compare_exchange out of the loop and only reset the flag on exit, but given that it's a rare edge case anyways I thought it made sense to leave as is.

Unfortunately, the RAII types used by `RwLock` are not `Send`, which is
why they can't be held over `await` boundaries. In order to allow
asynchronous events processing in multi-threaded environments, we here
allow to process events without holding the `total_consistency_lock`.
@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from a5358d0 to f2453b7 Compare April 21, 2023 16:05
sender.send(()).unwrap();
match sender.send(()) {
Ok(()) => {},
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why?

Copy link
Contributor Author

@tnull tnull Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're shutting the other task down after the first send. However, we also persist again on shutdown, which triggers a second send, which would panic as the receiver is already gone at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment for why this is ok would be helpful

Copy link
Contributor

@alecchendev alecchendev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think! Making sure I'm getting this right, do types need to implement Send across await boundaries because in a multi-threaded environments, a task waiting on a future to complete may be moved to execute on another thread?

// we can be sure no other persists happen while processing events.
let _read_guard = $self.total_consistency_lock.read().unwrap();
let mut processed_all_events = false;
while !processed_all_events {
Copy link
Contributor

@alecchendev alecchendev Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is all run in a while loop? IIUC there may be other events added to pending_events by other async tasks while handling the events, which is how we end up not having processed all events, but why do we keep processing until pending_events is empty as opposed to just processing the events that were present when we first call this function? I guess does it make much of a difference or is it more just that we might as well do it while we're here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we no longer allow multiple processors to run at the same time - if one process_events call starts, and makes some progress, then an event is generated, causing a second process_events call to happen, the second call might return early, but there's some events there the user expects to have processed. Thus, we need to make sure the first process_events goes around again and processes the remaining events.

lightning-background-processor/src/lib.rs Show resolved Hide resolved
sender.send(()).unwrap();
match sender.send(()) {
Ok(()) => {},
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment for why this is ok would be helpful

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
@TheBlueMatt TheBlueMatt merged commit 5f96d13 into lightningdevkit:main Apr 22, 2023
k0k0ne pushed a commit to bitlightlabs/rust-lightning that referenced this pull request Sep 30, 2024
0.0.115 - Apr 24, 2023 - "Rebroadcast the Bugfixes"

API Updates
===========

 * The MSRV of the main LDK crates has been increased to 1.48 (lightningdevkit#2107).
 * Attempting to claim an un-expired payment on a channel which has closed no
   longer fails. The expiry time of payments is exposed via
   `PaymentClaimable::claim_deadline` (lightningdevkit#2148).
 * `payment_metadata` is now supported in `Invoice` deserialization, sending,
   and receiving (via a new `RecipientOnionFields` struct) (lightningdevkit#2139, lightningdevkit#2127).
 * `Event::PaymentFailed` now exposes a failure reason (lightningdevkit#2142).
 * BOLT12 messages now support stateless generation and validation (lightningdevkit#1989).
 * The `NetworkGraph` is now pruned of stale data after RGS processing (lightningdevkit#2161).
 * Max inbound HTLCs in-flight can be changed in the handshake config (lightningdevkit#2138).
 * `lightning-transaction-sync` feature `esplora-async-https` was added (lightningdevkit#2085).
 * A `ChannelPending` event is now emitted after the initial handshake (lightningdevkit#2098).
 * `PaymentForwarded::outbound_amount_forwarded_msat` was added (lightningdevkit#2136).
 * `ChannelManager::list_channels_by_counterparty` was added (lightningdevkit#2079).
 * `ChannelDetails::feerate_sat_per_1000_weight` was added (lightningdevkit#2094).
 * `Invoice::fallback_addresses` was added to fetch `bitcoin` types (lightningdevkit#2023).
 * The offer/refund description is now exposed in `Invoice{,Request}` (lightningdevkit#2206).

Backwards Compatibility
=======================

 * Payments sent with the legacy `*_with_route` methods on LDK 0.0.115+ will no
   longer be retryable via the LDK 0.0.114- `retry_payment` method (lightningdevkit#2139).
 * `Event::PaymentPathFailed::retry` was removed and will always be `None` for
    payments initiated on 0.0.115 which fail on an earlier version (lightningdevkit#2063).
 * `Route`s and `PaymentParameters` with blinded path information will not be
   readable on prior versions of LDK. Such objects are not currently constructed
   by LDK, but may be when processing BOLT12 data in a coming release (lightningdevkit#2146).
 * Providing `ChannelMonitorUpdate`s generated by LDK 0.0.115 to a
   `ChannelMonitor` on 0.0.114 or before may panic (lightningdevkit#2059). Note that this is
   in general unsupported, and included here only for completeness.

Bug Fixes
=========

 * Fixed a case where `process_events_async` may `poll` a `Future` which has
   already completed (lightningdevkit#2081).
 * Fixed deserialization of `u16` arrays. This bug may have previously corrupted
   the historical buckets in a `ProbabilisticScorer`. Users relying on the
   historical buckets may wish to wipe their scorer on upgrade to remove corrupt
   data rather than waiting on it to decay (lightningdevkit#2191).
 * The `process_events_async` task is now `Send` and can thus be polled on a
   multi-threaded runtime (lightningdevkit#2199).
 * Fixed a missing macro export causing
   `impl_writeable_tlv_based_enum{,_upgradable}` calls to not compile (lightningdevkit#2091).
 * Fixed compilation of `lightning-invoice` with both `no-std` and serde (lightningdevkit#2187)
 * Fix an issue where the `background-processor` would not wake when a
   `ChannelMonitorUpdate` completed asynchronously, causing delays (lightningdevkit#2090).
 * Fix an issue where `process_events_async` would exit immediately (lightningdevkit#2145).
 * `Router` calls from the `ChannelManager` now call `find_route_with_id` rather
   than `find_route`, as was intended and described in the API (lightningdevkit#2092).
 * Ensure `process_events_async` always exits if any sleep future returns true,
   not just if all sleep futures repeatedly return true (lightningdevkit#2145).
 * `channel_update` messages no longer set the disable bit unless the peer has
   been disconnected for some time. This should resolve cases where channels are
   disabled for extended periods of time (lightningdevkit#2198).
 * We no longer remove CLN nodes from the network graph for violating the BOLT
   spec in some cases after failing to pay through them (lightningdevkit#2220).
 * Fixed a debug assertion which may panic under heavy load (lightningdevkit#2172).
 * `CounterpartyForceClosed::peer_msg` is now wrapped in UntrustedString (lightningdevkit#2114)
 * Fixed a potential deadlock in `funding_transaction_generated` (lightningdevkit#2158).

Security
========

 * Transaction re-broadcasting is now substantially more aggressive, including a
   new regular rebroadcast feature called on a timer from the
   `background-processor` or from `ChainMonitor::rebroadcast_pending_claims`.
   This should substantially increase transaction confirmation reliability
   without relying on downstream `TransactionBroadcaster` implementations for
   rebroadcasting (lightningdevkit#2203, lightningdevkit#2205, lightningdevkit#2208).
 * Implemented the changes from BOLT PRs lightningdevkit#1031, lightningdevkit#1032, and lightningdevkit#1040 which resolve a
   privacy vulnerability which allows an intermediate node on the path to
   discover the final destination for a payment (lightningdevkit#2062).

In total, this release features 110 files changed, 11928 insertions, 6368
deletions in 215 commits from 21 authors, in alphabetical order:
 * Advait
 * Alan Cohen
 * Alec Chen
 * Allan Douglas R. de Oliveira
 * Arik Sosman
 * Elias Rohrer
 * Evan Feenstra
 * Jeffrey Czyz
 * John Cantrell
 * Lucas Soriano del Pino
 * Marc Tyndel
 * Matt Corallo
 * Paul Miller
 * Steven
 * Steven Williamson
 * Steven Zhao
 * Tony Giorgio
 * Valentine Wallace
 * Wilmer Paulino
 * benthecarman
 * munjesi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch total_consistency_lock to a Send RwLock variant
6 participants