Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never store more than one StdWaker per live Future #2894

Merged
merged 3 commits into from
Feb 16, 2024

Conversation

TheBlueMatt
Copy link
Collaborator

When an std::future::Future is poll()ed, we're only supposed to
use the latest Waker provided. However, we currently push an
StdWaker onto our callback list every time poll is called,
waking every Waker but also using more and more memory until the
Future itself is woken.

Here we fix this by removing any StdWakers stored for a given
Future when it is dropped or prior to pushing a new StdWaker
onto the list when polled.

Sadly, the introduction of a Drop impl for Future means we
can't trivially destructure the struct any longer, causing a few
methods to need to take Futures by reference rather than
ownership and clone a few Arcs.

Fixes #2874

@TheBlueMatt TheBlueMatt added this to the 0.0.122 milestone Feb 13, 2024
Copy link

coderabbitai bot commented Feb 13, 2024

Walkthrough

The update introduces enhancements to the handling of futures and wakers, focusing on optimizing memory usage and improving the efficiency of future polling. It involves changes in how futures are managed, specifically addressing issues with memory allocation by refining the callback and waker mechanisms. The modifications aim to prevent unnecessary memory growth and ensure more efficient future completion tracking, particularly in busy environments.

Changes

Files Change Summary
lightning-background-processor/src/lib.rs Pass references to futures instead of direct calls, optimizing future handling and efficiency.
lightning/src/util/wakers.rs Overhaul future management: introduce self_idx, manage callbacks and wakers, enhance polling.

Assessment against linked issues

Objective Addressed Explanation
<Future as std::..::Future>::poll() always allocates, growing until we're woken (#2874)

Poem

In the realm of code where changes brew,
A rabbit danced, its tasks anew.
Futures and wakers now aligned,
Memory growth no longer unconfined.
Efficiency shines in lines of code,
As the rabbit's work in the system flowed.
🌟 "Optimization achieved," it declared with glee,
Leaving behind a better codebase for all to see.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 73da722 and 60bff36.
Files selected for processing (2)
  • lightning-background-processor/src/lib.rs (1 hunks)
  • lightning/src/util/wakers.rs (15 hunks)
Additional comments: 8
lightning/src/util/wakers.rs (6)
  • 59-70: The logic for managing self_idx and resetting the future state when callbacks_made is true appears sound. However, ensure that the logic for resetting lock.0 to false and handling next_idx is thoroughly tested, especially in concurrent scenarios where multiple futures might be interacting with the same Notifier.
  • 78-82: Adding std_future_callbacks to track standard wakers is a critical improvement. Ensure that the vector is efficiently managed, especially in terms of memory usage and performance when futures are frequently polled and woken.
  • 124-128: The management of std_future_callbacks with a tuple of (usize, StdWaker) is appropriate for ensuring that only the most recent waker is used. However, consider the performance implications of frequently modifying this vector, especially in high-concurrency scenarios.
  • 207-208: The Drop implementation for Future correctly ensures that the std_future_callbacks vector does not retain entries for dropped futures. This is crucial for preventing memory leaks. Verify that this logic works as expected in scenarios where futures are dropped before being completed.
  • 225-226: The logic to retain only the last waker by removing the current Future's waker before adding a new one is a good approach to prevent unnecessary wake-ups. Ensure this mechanism is thoroughly tested to confirm it behaves as expected under various polling scenarios.
  • 730-747: The test case multi_poll_stores_single_waker effectively demonstrates the intended behavior of storing only the last waker and cleaning up upon Future drop. Ensure comprehensive testing covers edge cases, such as rapid polling and dropping of futures in a multi-threaded environment.
lightning-background-processor/src/lib.rs (2)
  • 857-858: The changes involve passing references to futures obtained from channel_manager.get_event_or_persistence_needed_future() and chain_monitor.get_update_future() within Sleeper::from_two_futures() instead of directly calling these functions. This adjustment is part of a broader strategy to manage futures and their lifecycles more efficiently.
  • 857-858: Ensure that the changes to pass references to futures instead of directly calling the functions are correctly implemented across all usages. This approach should improve efficiency by avoiding unnecessary calls and managing futures' lifecycles more effectively.

@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (73da722) 89.13% compared to head (8157c01) 89.39%.
Report is 8 commits behind head on main.

Files Patch % Lines
lightning/src/util/wakers.rs 98.85% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2894      +/-   ##
==========================================
+ Coverage   89.13%   89.39%   +0.26%     
==========================================
  Files         115      115              
  Lines       94179    96441    +2262     
  Branches    94179    96441    +2262     
==========================================
+ Hits        83944    86211    +2267     
- Misses       7761     7807      +46     
+ Partials     2474     2423      -51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tnull
tnull previously approved these changes Feb 14, 2024
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Grr, I recently even stumbled across the one line in the Future docs that mentions this but didn't think through the implications for LDK. FWIW, it would be nice if they would highlight the potential leakage and/or mention it somewhere besides this one line though.

Given how invasive the changes are, I think this could use a second reviewer.

lock.1.take();
lock.0 = false;
} else {
self_idx = locked.next_idx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'fetch and add' could be a method so that (in future, no pun intended) we would never forget to increase the counter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with a constructor a bit trying to make it more robust but didn't really see a decent way to do it without double-locking everywhere. Just defining a method to fetch-and-increment the index doesn't seem like it'll actually prevent a bug cause we'll just forget to use it :)

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lightning/src/util/wakers.rs Show resolved Hide resolved
Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments above! 🚀

lightning/src/util/wakers.rs Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 3fd4b39 and 5a5a8d1.
Files selected for processing (2)
  • lightning-background-processor/src/lib.rs (1 hunks)
  • lightning/src/util/wakers.rs (15 hunks)
Files skipped from review as they are similar to previous changes (2)
  • lightning-background-processor/src/lib.rs
  • lightning/src/util/wakers.rs

@wpaulino
Copy link
Contributor

Feel free to squash

In the next commit we'll fix a memory leak due to keeping too many
`std::task::Waker` callbacks in `FutureState` from redundant `poll`
calls, but first we need to split handling of `StdWaker`-based
future wake callbacks from normal ones, which we do here.
When an `std::future::Future` is `poll()`ed, we're only supposed to
use the latest `Waker` provided. However, we currently push an
`StdWaker` onto our callback list every time `poll` is called,
waking every `Waker` but also using more and more memory until the
`Future` itself is woken.

Here we take a step towards fixing this by giving each `Future` a
unique index and storing which `Future` an `StdWaker` came from in
the callback list. This sets us up to deduplicate `StdWaker`s by
`Future`s in the next commit.
When an `std::future::Future` is `poll()`ed, we're only supposed to
use the latest `Waker` provided. However, we currently push an
`StdWaker` onto our callback list every time `poll` is called,
waking every `Waker` but also using more and more memory until the
`Future` itself is woken.

Here we fix this by removing any `StdWaker`s stored for a given
`Future` when it is `drop`ped or prior to pushing a new `StdWaker`
onto the list when `poll`ed.

Sadly, the introduction of a `Drop` impl for `Future` means we
can't trivially destructure the struct any longer, causing a few
methods to need to take `Future`s by reference rather than
ownership and `clone` a few `Arc`s.

Fixes lightningdevkit#2874
@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes, diff from yesterday:

$ git diff-tree -U2 60bff36 8157c01e
diff --git a/lightning/src/util/wakers.rs b/lightning/src/util/wakers.rs
index 28cea2624..b2c9d21b9 100644
--- a/lightning/src/util/wakers.rs
+++ b/lightning/src/util/wakers.rs
@@ -118,10 +118,11 @@ define_callback!();

 pub(crate) struct FutureState {
-	// When we're tracking whether a callback counts as having woken the user's code, we check the
-	// first bool - set to false if we're just calling a Waker, and true if we're calling an actual
-	// user-provided function.
+	// `callbacks` count as having woken the users' code (as they go direct to the user), but
+	// `std_future_callbacks` and `callbacks_with_state` do not (as the first just wakes a future,
+	// we only count it after another `poll()` and the second wakes a `Sleeper` which handles
+	// setting `callbacks_made` itself).
 	callbacks: Vec<Box<dyn FutureCallback>>,
 	std_future_callbacks: Vec<(usize, StdWaker)>,
-	callbacks_with_state: Vec<(bool, Box<dyn Fn(&Arc<Mutex<FutureState>>) -> () + Send>)>,
+	callbacks_with_state: Vec<Box<dyn Fn(&Arc<Mutex<FutureState>>) -> () + Send>>,
 	complete: bool,
 	callbacks_made: bool,
@@ -139,7 +140,6 @@ fn complete_future(this: &Arc<Mutex<FutureState>>) -> bool {
 		waker.0.wake_by_ref();
 	}
-	for (counts_as_call, callback) in state.callbacks_with_state.drain(..) {
+	for callback in state.callbacks_with_state.drain(..) {
 		(callback)(this);
-		state.callbacks_made |= counts_as_call;
 	}
 	state.complete = true;
@@ -267,8 +267,8 @@ impl Sleeper {
 					break;
 				}
-				notifier.callbacks_with_state.push((false, Box::new(move |notifier_ref| {
+				notifier.callbacks_with_state.push(Box::new(move |notifier_ref| {
 					*notified_fut_ref.lock().unwrap() = Some(Arc::clone(notifier_ref));
 					cv_ref.notify_all();
-				})));
+				}));
 			}
 		}
@@ -745,4 +745,15 @@ mod tests {
 		mem::drop(future_b);
 		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
+
+		// Further, after polling a future twice, if the notifier is woken all Wakers are dropped.
+		let mut future_a = notifier.get_future();
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Pending);
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 1);
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Pending);
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 1);
+		notifier.notify();
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Ready(()));
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
 	}
 }

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 3fd4b39 and 8157c01.
Files selected for processing (2)
  • lightning-background-processor/src/lib.rs (1 hunks)
  • lightning/src/util/wakers.rs (15 hunks)
Files skipped from review as they are similar to previous changes (2)
  • lightning-background-processor/src/lib.rs
  • lightning/src/util/wakers.rs

@tnull tnull merged commit e32020c into lightningdevkit:main Feb 16, 2024
11 of 15 checks passed
k0k0ne pushed a commit to bitlightlabs/rust-lightning that referenced this pull request Sep 30, 2024
v0.0.123 - May 08, 2024 - "BOLT12 Dust Sweeping"

API Updates
===========

 * To reduce risk of force-closures and improve HTLC reliability the default
   dust exposure limit has been increased to
   `MaxDustHTLCExposure::FeeRateMultiplier(10_000)`. Users with existing
   channels might want to consider using
   `ChannelManager::update_channel_config` to apply the new default (lightningdevkit#3045).
 * `ChainMonitor::archive_fully_resolved_channel_monitors` is now provided to
   remove from memory `ChannelMonitor`s that have been fully resolved on-chain
   and are now not needed. It uses the new `Persist::archive_persisted_channel`
   to inform the storage layer that such a monitor should be archived (lightningdevkit#2964).
 * An `OutputSweeper` is now provided which will automatically sweep
   `SpendableOutputDescriptor`s, retrying until the sweep confirms (lightningdevkit#2825).
 * After initiating an outbound channel, a peer disconnection no longer results
   in immediate channel closure. Rather, if the peer is reconnected before the
   channel times out LDK will automatically retry opening it (lightningdevkit#2725).
 * `PaymentPurpose` now has separate variants for BOLT12 payments, which
   include fields from the `invoice_request` as well as the `OfferId` (lightningdevkit#2970).
 * `ChannelDetails` now includes a list of in-flight HTLCs (lightningdevkit#2442).
 * `Event::PaymentForwarded` now includes `skimmed_fee_msat` (lightningdevkit#2858).
 * The `hashbrown` dependency has been upgraded and the use of `ahash` as the
   no-std hash table hash function has been removed. As a consequence, LDK's
   `Hash{Map,Set}`s no longer feature several constructors when LDK is built
   with no-std; see the `util::hash_tables` module instead. On platforms that
   `getrandom` supports, setting the `possiblyrandom/getrandom` feature flag
   will ensure hash tables are resistant to HashDoS attacks, though the
   `possiblyrandom` crate should detect most common platforms (lightningdevkit#2810, lightningdevkit#2891).
 * `ChannelMonitor`-originated requests to the `ChannelSigner` can now fail and
   be retried using `ChannelMonitor::signer_unblocked` (lightningdevkit#2816).
 * `SpendableOutputDescriptor::to_psbt_input` now includes the `witness_script`
   where available as well as new proprietary data which can be used to
   re-derive some spending keys from the base key (lightningdevkit#2761, lightningdevkit#3004).
 * `OutPoint::to_channel_id` has been removed in favor of
   `ChannelId::v1_from_funding_outpoint` in preparation for v2 channels with a
   different `ChannelId` derivation scheme (lightningdevkit#2797).
 * `PeerManager::get_peer_node_ids` has been replaced with `list_peers` and
   `peer_by_node_id`, which provide more details (lightningdevkit#2905).
 * `Bolt11Invoice::get_payee_pub_key` is now provided (lightningdevkit#2909).
 * `Default[Message]Router` now take an `entropy_source` argument (lightningdevkit#2847).
 * `ClosureReason::HTLCsTimedOut` has been separated out from
   `ClosureReason::HolderForceClosed` as it is the most common case (lightningdevkit#2887).
 * `ClosureReason::CooperativeClosure` is now split into
   `{Counterparty,Locally}Initiated` variants (lightningdevkit#2863).
 * `Event::ChannelPending::channel_type` is now provided (lightningdevkit#2872).
 * `PaymentForwarded::{prev,next}_user_channel_id` are now provided (lightningdevkit#2924).
 * Channel init messages have been refactored towards V2 channels (lightningdevkit#2871).
 * `BumpTransactionEvent` now contains the channel and counterparty (lightningdevkit#2873).
 * `util::scid_utils` is now public, with some trivial utilities to examine
   short channel ids (lightningdevkit#2694).
 * `DirectedChannelInfo::{source,target}` are now public (lightningdevkit#2870).
 * Bounds in `lightning-background-processor` were simplified by using
   `AChannelManager` (lightningdevkit#2963).
 * The `Persist` impl for `KVStore` no longer requires `Sized`, allowing for
   the use of `dyn KVStore` as `Persist` (lightningdevkit#2883, lightningdevkit#2976).
 * `From<PaymentPreimage>` is now implemented for `PaymentHash` (lightningdevkit#2918).
 * `NodeId::from_slice` is now provided (lightningdevkit#2942).
 * `ChannelManager` deserialization may now fail with `DangerousValue` when
    LDK's persistence API was violated (lightningdevkit#2974).

Bug Fixes
=========

 * Excess fees on counterparty commitment transactions are now included in the
   dust exposure calculation. This lines behavior up with some cases where
   transaction fees can be burnt, making them effectively dust exposure (lightningdevkit#3045).
 * `Future`s used as an `std::...::Future` could grow in size unbounded if it
   was never woken. For those not using async persistence and using the async
   `lightning-background-processor`, this could cause a memory leak in the
   `ChainMonitor` (lightningdevkit#2894).
 * Inbound channel requests that fail in
   `ChannelManager::accept_inbound_channel` would previously have stalled from
   the peer's perspective as no `error` message was sent (lightningdevkit#2953).
 * Blinded path construction has been tuned to select paths more likely to
   succeed, improving BOLT12 payment reliability (lightningdevkit#2911, lightningdevkit#2912).
 * After a reorg, `lightning-transaction-sync` could have failed to follow a
   transaction that LDK needed information about (lightningdevkit#2946).
 * `RecipientOnionFields`' `custom_tlvs` are now propagated to recipients when
   paying with blinded paths (lightningdevkit#2975).
 * `Event::ChannelClosed` is now properly generated and peers are properly
   notified for all channels that as a part of a batch channel open fail to be
   funded (lightningdevkit#3029).
 * In cases where user event processing is substantially delayed such that we
   complete multiple round-trips with our peers before a `PaymentSent` event is
   handled and then restart without persisting the `ChannelManager` after having
   persisted a `ChannelMonitor[Update]`, on startup we may have `Err`d trying to
   deserialize the `ChannelManager` (lightningdevkit#3021).
 * If a peer has relatively high latency, `PeerManager` may have failed to
   establish a connection (lightningdevkit#2993).
 * `ChannelUpdate` messages broadcasted for our own channel closures are now
   slightly more robust (lightningdevkit#2731).
 * Deserializing malformed BOLT11 invoices may have resulted in an integer
   overflow panic in debug builds (lightningdevkit#3032).
 * In exceedingly rare cases (no cases of this are known), LDK may have created
   an invalid serialization for a `ChannelManager` (lightningdevkit#2998).
 * Message processing latency handling BOLT12 payments has been reduced (lightningdevkit#2881).
 * Latency in processing `Event::SpendableOutputs` may be reduced (lightningdevkit#3033).

Node Compatibility
==================

 * LDK's blinded paths were inconsistent with other implementations in several
   ways, which have been addressed (lightningdevkit#2856, lightningdevkit#2936, lightningdevkit#2945).
 * LDK's messaging blinded paths now support the latest features which some
   nodes may begin relying on soon (lightningdevkit#2961).
 * LDK's BOLT12 structs have been updated to support some last-minute changes to
   the spec (lightningdevkit#3017, lightningdevkit#3018).
 * CLN v24.02 requires the `gossip_queries` feature for all peers, however LDK
   by default does not set it for those not using a `P2PGossipSync` (e.g. those
   using RGS). This change was reverted in CLN v24.02.2 however for now LDK
   always sets the `gossip_queries` feature. This change is expected to be
   reverted in a future LDK release (lightningdevkit#2959).

Security
========
0.0.123 fixes a denial-of-service vulnerability which we believe to be reachable
from untrusted input when parsing invalid BOLT11 invoices containing non-ASCII
characters.
 * BOLT11 invoices with non-ASCII characters in the human-readable-part may
   cause an out-of-bounds read attempt leading to a panic (lightningdevkit#3054). Note that all
   BOLT11 invoices containing non-ASCII characters are invalid.

In total, this release features 150 files changed, 19307 insertions, 6306
deletions in 360 commits since 0.0.121 from 17 authors, in alphabetical order:

 * Arik Sosman
 * Duncan Dean
 * Elias Rohrer
 * Evan Feenstra
 * Jeffrey Czyz
 * Keyue Bao
 * Matt Corallo
 * Orbital
 * Sergi Delgado Segura
 * Valentine Wallace
 * Willem Van Lint
 * Wilmer Paulino
 * benthecarman
 * jbesraa
 * olegkubrakov
 * optout
 * shaavan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<Future as std::..::Future>::poll() always allocates, growing until we're woken
5 participants