Never store more than one StdWaker per live Future #2894

TheBlueMatt · 2024-02-13T22:47:46Z

When an std::future::Future is poll()ed, we're only supposed to
use the latest Waker provided. However, we currently push an
StdWaker onto our callback list every time poll is called,
waking every Waker but also using more and more memory until the
Future itself is woken.

Here we fix this by removing any StdWakers stored for a given
Future when it is dropped or prior to pushing a new StdWaker
onto the list when polled.

Sadly, the introduction of a Drop impl for Future means we
can't trivially destructure the struct any longer, causing a few
methods to need to take Futures by reference rather than
ownership and clone a few Arcs.

Fixes #2874

coderabbitai · 2024-02-13T22:47:56Z

Walkthrough

The update introduces enhancements to the handling of futures and wakers, focusing on optimizing memory usage and improving the efficiency of future polling. It involves changes in how futures are managed, specifically addressing issues with memory allocation by refining the callback and waker mechanisms. The modifications aim to prevent unnecessary memory growth and ensure more efficient future completion tracking, particularly in busy environments.

Changes

Files	Change Summary
`lightning-background-processor/src/lib.rs`	Pass references to futures instead of direct calls, optimizing future handling and efficiency.
`lightning/src/util/wakers.rs`	Overhaul future management: introduce `self_idx`, manage callbacks and wakers, enhance polling.

Assessment against linked issues

Objective	Addressed	Explanation
`<Future as std::..::Future>::poll()` always allocates, growing until we're woken (#2874)	✅

Poem

In the realm of code where changes brew,
A rabbit danced, its tasks anew.
Futures and wakers now aligned,
Memory growth no longer unconfined.
Efficiency shines in lines of code,
As the rabbit's work in the system flowed.
🌟 "Optimization achieved," it declared with glee,
Leaving behind a better codebase for all to see.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 73da722 and 60bff36.

Files selected for processing (2)

lightning-background-processor/src/lib.rs (1 hunks)
lightning/src/util/wakers.rs (15 hunks)

Additional comments: 8

lightning/src/util/wakers.rs (6)

59-70: The logic for managing self_idx and resetting the future state when callbacks_made is true appears sound. However, ensure that the logic for resetting lock.0 to false and handling next_idx is thoroughly tested, especially in concurrent scenarios where multiple futures might be interacting with the same Notifier.

78-82: Adding std_future_callbacks to track standard wakers is a critical improvement. Ensure that the vector is efficiently managed, especially in terms of memory usage and performance when futures are frequently polled and woken.

124-128: The management of std_future_callbacks with a tuple of (usize, StdWaker) is appropriate for ensuring that only the most recent waker is used. However, consider the performance implications of frequently modifying this vector, especially in high-concurrency scenarios.

207-208: The Drop implementation for Future correctly ensures that the std_future_callbacks vector does not retain entries for dropped futures. This is crucial for preventing memory leaks. Verify that this logic works as expected in scenarios where futures are dropped before being completed.

225-226: The logic to retain only the last waker by removing the current Future's waker before adding a new one is a good approach to prevent unnecessary wake-ups. Ensure this mechanism is thoroughly tested to confirm it behaves as expected under various polling scenarios.

730-747: The test case multi_poll_stores_single_waker effectively demonstrates the intended behavior of storing only the last waker and cleaning up upon Future drop. Ensure comprehensive testing covers edge cases, such as rapid polling and dropping of futures in a multi-threaded environment.

lightning-background-processor/src/lib.rs (2)

857-858: The changes involve passing references to futures obtained from channel_manager.get_event_or_persistence_needed_future() and chain_monitor.get_update_future() within Sleeper::from_two_futures() instead of directly calling these functions. This adjustment is part of a broader strategy to manage futures and their lifecycles more efficiently.

857-858: Ensure that the changes to pass references to futures instead of directly calling the functions are correctly implemented across all usages. This approach should improve efficiency by avoiding unnecessary calls and managing futures' lifecycles more effectively.

codecov-commenter · 2024-02-13T22:55:29Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (73da722) 89.13% compared to head (8157c01) 89.39%.
Report is 8 commits behind head on main.

Files	Patch %	Lines
lightning/src/util/wakers.rs	98.85%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2894      +/-   ##
==========================================
+ Coverage   89.13%   89.39%   +0.26%     
==========================================
  Files         115      115              
  Lines       94179    96441    +2262     
  Branches    94179    96441    +2262     
==========================================
+ Hits        83944    86211    +2267     
- Misses       7761     7807      +46     
+ Partials     2474     2423      -51

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tnull

LGTM

Grr, I recently even stumbled across the one line in the Future docs that mentions this but didn't think through the implications for LDK. FWIW, it would be nice if they would highlight the potential leakage and/or mention it somewhere besides this one line though.

Given how invasive the changes are, I think this could use a second reviewer.

tnull · 2024-02-14T17:32:49Z

lightning/src/util/wakers.rs

 				lock.1.take();
 				lock.0 = false;
+			} else {
+				self_idx = locked.next_idx;


nit: 'fetch and add' could be a method so that (in future, no pun intended) we would never forget to increase the counter?

I played with a constructor a bit trying to make it more robust but didn't really see a decent way to do it without double-locking everywhere. Just defining a method to fetch-and-increment the index doesn't seem like it'll actually prevent a bug cause we'll just forget to use it :)

wpaulino

LGTM

lightning/src/util/wakers.rs

shaavan

LGTM modulo comments above! 🚀

lightning/src/util/wakers.rs

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 3fd4b39 and 5a5a8d1.

Files selected for processing (2)

lightning-background-processor/src/lib.rs (1 hunks)
lightning/src/util/wakers.rs (15 hunks)

Files skipped from review as they are similar to previous changes (2)

lightning-background-processor/src/lib.rs
lightning/src/util/wakers.rs

wpaulino · 2024-02-15T21:03:35Z

Feel free to squash

In the next commit we'll fix a memory leak due to keeping too many `std::task::Waker` callbacks in `FutureState` from redundant `poll` calls, but first we need to split handling of `StdWaker`-based future wake callbacks from normal ones, which we do here.

When an `std::future::Future` is `poll()`ed, we're only supposed to use the latest `Waker` provided. However, we currently push an `StdWaker` onto our callback list every time `poll` is called, waking every `Waker` but also using more and more memory until the `Future` itself is woken. Here we take a step towards fixing this by giving each `Future` a unique index and storing which `Future` an `StdWaker` came from in the callback list. This sets us up to deduplicate `StdWaker`s by `Future`s in the next commit.

When an `std::future::Future` is `poll()`ed, we're only supposed to use the latest `Waker` provided. However, we currently push an `StdWaker` onto our callback list every time `poll` is called, waking every `Waker` but also using more and more memory until the `Future` itself is woken. Here we fix this by removing any `StdWaker`s stored for a given `Future` when it is `drop`ped or prior to pushing a new `StdWaker` onto the list when `poll`ed. Sadly, the introduction of a `Drop` impl for `Future` means we can't trivially destructure the struct any longer, causing a few methods to need to take `Future`s by reference rather than ownership and `clone` a few `Arc`s. Fixes lightningdevkit#2874

TheBlueMatt · 2024-02-15T21:52:57Z

Squashed without further changes, diff from yesterday:

$ git diff-tree -U2 60bff36 8157c01e
diff --git a/lightning/src/util/wakers.rs b/lightning/src/util/wakers.rs
index 28cea2624..b2c9d21b9 100644
--- a/lightning/src/util/wakers.rs
+++ b/lightning/src/util/wakers.rs
@@ -118,10 +118,11 @@ define_callback!();

 pub(crate) struct FutureState {
-	// When we're tracking whether a callback counts as having woken the user's code, we check the
-	// first bool - set to false if we're just calling a Waker, and true if we're calling an actual
-	// user-provided function.
+	// `callbacks` count as having woken the users' code (as they go direct to the user), but
+	// `std_future_callbacks` and `callbacks_with_state` do not (as the first just wakes a future,
+	// we only count it after another `poll()` and the second wakes a `Sleeper` which handles
+	// setting `callbacks_made` itself).
 	callbacks: Vec<Box<dyn FutureCallback>>,
 	std_future_callbacks: Vec<(usize, StdWaker)>,
-	callbacks_with_state: Vec<(bool, Box<dyn Fn(&Arc<Mutex<FutureState>>) -> () + Send>)>,
+	callbacks_with_state: Vec<Box<dyn Fn(&Arc<Mutex<FutureState>>) -> () + Send>>,
 	complete: bool,
 	callbacks_made: bool,
@@ -139,7 +140,6 @@ fn complete_future(this: &Arc<Mutex<FutureState>>) -> bool {
 		waker.0.wake_by_ref();
 	}
-	for (counts_as_call, callback) in state.callbacks_with_state.drain(..) {
+	for callback in state.callbacks_with_state.drain(..) {
 		(callback)(this);
-		state.callbacks_made |= counts_as_call;
 	}
 	state.complete = true;
@@ -267,8 +267,8 @@ impl Sleeper {
 					break;
 				}
-				notifier.callbacks_with_state.push((false, Box::new(move |notifier_ref| {
+				notifier.callbacks_with_state.push(Box::new(move |notifier_ref| {
 					*notified_fut_ref.lock().unwrap() = Some(Arc::clone(notifier_ref));
 					cv_ref.notify_all();
-				})));
+				}));
 			}
 		}
@@ -745,4 +745,15 @@ mod tests {
 		mem::drop(future_b);
 		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
+
+		// Further, after polling a future twice, if the notifier is woken all Wakers are dropped.
+		let mut future_a = notifier.get_future();
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Pending);
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 1);
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Pending);
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 1);
+		notifier.notify();
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
+		assert_eq!(Pin::new(&mut future_a).poll(&mut Context::from_waker(&create_waker().1)), Poll::Ready(()));
+		assert_eq!(future_state.lock().unwrap().std_future_callbacks.len(), 0);
 	}
 }

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 3fd4b39 and 8157c01.

Files selected for processing (2)

lightning-background-processor/src/lib.rs (1 hunks)
lightning/src/util/wakers.rs (15 hunks)

Files skipped from review as they are similar to previous changes (2)

lightning-background-processor/src/lib.rs
lightning/src/util/wakers.rs

v0.0.123 - May 08, 2024 - "BOLT12 Dust Sweeping" API Updates =========== * To reduce risk of force-closures and improve HTLC reliability the default dust exposure limit has been increased to `MaxDustHTLCExposure::FeeRateMultiplier(10_000)`. Users with existing channels might want to consider using `ChannelManager::update_channel_config` to apply the new default (lightningdevkit#3045). * `ChainMonitor::archive_fully_resolved_channel_monitors` is now provided to remove from memory `ChannelMonitor`s that have been fully resolved on-chain and are now not needed. It uses the new `Persist::archive_persisted_channel` to inform the storage layer that such a monitor should be archived (lightningdevkit#2964). * An `OutputSweeper` is now provided which will automatically sweep `SpendableOutputDescriptor`s, retrying until the sweep confirms (lightningdevkit#2825). * After initiating an outbound channel, a peer disconnection no longer results in immediate channel closure. Rather, if the peer is reconnected before the channel times out LDK will automatically retry opening it (lightningdevkit#2725). * `PaymentPurpose` now has separate variants for BOLT12 payments, which include fields from the `invoice_request` as well as the `OfferId` (lightningdevkit#2970). * `ChannelDetails` now includes a list of in-flight HTLCs (lightningdevkit#2442). * `Event::PaymentForwarded` now includes `skimmed_fee_msat` (lightningdevkit#2858). * The `hashbrown` dependency has been upgraded and the use of `ahash` as the no-std hash table hash function has been removed. As a consequence, LDK's `Hash{Map,Set}`s no longer feature several constructors when LDK is built with no-std; see the `util::hash_tables` module instead. On platforms that `getrandom` supports, setting the `possiblyrandom/getrandom` feature flag will ensure hash tables are resistant to HashDoS attacks, though the `possiblyrandom` crate should detect most common platforms (lightningdevkit#2810, lightningdevkit#2891). * `ChannelMonitor`-originated requests to the `ChannelSigner` can now fail and be retried using `ChannelMonitor::signer_unblocked` (lightningdevkit#2816). * `SpendableOutputDescriptor::to_psbt_input` now includes the `witness_script` where available as well as new proprietary data which can be used to re-derive some spending keys from the base key (lightningdevkit#2761, lightningdevkit#3004). * `OutPoint::to_channel_id` has been removed in favor of `ChannelId::v1_from_funding_outpoint` in preparation for v2 channels with a different `ChannelId` derivation scheme (lightningdevkit#2797). * `PeerManager::get_peer_node_ids` has been replaced with `list_peers` and `peer_by_node_id`, which provide more details (lightningdevkit#2905). * `Bolt11Invoice::get_payee_pub_key` is now provided (lightningdevkit#2909). * `Default[Message]Router` now take an `entropy_source` argument (lightningdevkit#2847). * `ClosureReason::HTLCsTimedOut` has been separated out from `ClosureReason::HolderForceClosed` as it is the most common case (lightningdevkit#2887). * `ClosureReason::CooperativeClosure` is now split into `{Counterparty,Locally}Initiated` variants (lightningdevkit#2863). * `Event::ChannelPending::channel_type` is now provided (lightningdevkit#2872). * `PaymentForwarded::{prev,next}_user_channel_id` are now provided (lightningdevkit#2924). * Channel init messages have been refactored towards V2 channels (lightningdevkit#2871). * `BumpTransactionEvent` now contains the channel and counterparty (lightningdevkit#2873). * `util::scid_utils` is now public, with some trivial utilities to examine short channel ids (lightningdevkit#2694). * `DirectedChannelInfo::{source,target}` are now public (lightningdevkit#2870). * Bounds in `lightning-background-processor` were simplified by using `AChannelManager` (lightningdevkit#2963). * The `Persist` impl for `KVStore` no longer requires `Sized`, allowing for the use of `dyn KVStore` as `Persist` (lightningdevkit#2883, lightningdevkit#2976). * `From<PaymentPreimage>` is now implemented for `PaymentHash` (lightningdevkit#2918). * `NodeId::from_slice` is now provided (lightningdevkit#2942). * `ChannelManager` deserialization may now fail with `DangerousValue` when LDK's persistence API was violated (lightningdevkit#2974). Bug Fixes ========= * Excess fees on counterparty commitment transactions are now included in the dust exposure calculation. This lines behavior up with some cases where transaction fees can be burnt, making them effectively dust exposure (lightningdevkit#3045). * `Future`s used as an `std::...::Future` could grow in size unbounded if it was never woken. For those not using async persistence and using the async `lightning-background-processor`, this could cause a memory leak in the `ChainMonitor` (lightningdevkit#2894). * Inbound channel requests that fail in `ChannelManager::accept_inbound_channel` would previously have stalled from the peer's perspective as no `error` message was sent (lightningdevkit#2953). * Blinded path construction has been tuned to select paths more likely to succeed, improving BOLT12 payment reliability (lightningdevkit#2911, lightningdevkit#2912). * After a reorg, `lightning-transaction-sync` could have failed to follow a transaction that LDK needed information about (lightningdevkit#2946). * `RecipientOnionFields`' `custom_tlvs` are now propagated to recipients when paying with blinded paths (lightningdevkit#2975). * `Event::ChannelClosed` is now properly generated and peers are properly notified for all channels that as a part of a batch channel open fail to be funded (lightningdevkit#3029). * In cases where user event processing is substantially delayed such that we complete multiple round-trips with our peers before a `PaymentSent` event is handled and then restart without persisting the `ChannelManager` after having persisted a `ChannelMonitor[Update]`, on startup we may have `Err`d trying to deserialize the `ChannelManager` (lightningdevkit#3021). * If a peer has relatively high latency, `PeerManager` may have failed to establish a connection (lightningdevkit#2993). * `ChannelUpdate` messages broadcasted for our own channel closures are now slightly more robust (lightningdevkit#2731). * Deserializing malformed BOLT11 invoices may have resulted in an integer overflow panic in debug builds (lightningdevkit#3032). * In exceedingly rare cases (no cases of this are known), LDK may have created an invalid serialization for a `ChannelManager` (lightningdevkit#2998). * Message processing latency handling BOLT12 payments has been reduced (lightningdevkit#2881). * Latency in processing `Event::SpendableOutputs` may be reduced (lightningdevkit#3033). Node Compatibility ================== * LDK's blinded paths were inconsistent with other implementations in several ways, which have been addressed (lightningdevkit#2856, lightningdevkit#2936, lightningdevkit#2945). * LDK's messaging blinded paths now support the latest features which some nodes may begin relying on soon (lightningdevkit#2961). * LDK's BOLT12 structs have been updated to support some last-minute changes to the spec (lightningdevkit#3017, lightningdevkit#3018). * CLN v24.02 requires the `gossip_queries` feature for all peers, however LDK by default does not set it for those not using a `P2PGossipSync` (e.g. those using RGS). This change was reverted in CLN v24.02.2 however for now LDK always sets the `gossip_queries` feature. This change is expected to be reverted in a future LDK release (lightningdevkit#2959). Security ======== 0.0.123 fixes a denial-of-service vulnerability which we believe to be reachable from untrusted input when parsing invalid BOLT11 invoices containing non-ASCII characters. * BOLT11 invoices with non-ASCII characters in the human-readable-part may cause an out-of-bounds read attempt leading to a panic (lightningdevkit#3054). Note that all BOLT11 invoices containing non-ASCII characters are invalid. In total, this release features 150 files changed, 19307 insertions, 6306 deletions in 360 commits since 0.0.121 from 17 authors, in alphabetical order: * Arik Sosman * Duncan Dean * Elias Rohrer * Evan Feenstra * Jeffrey Czyz * Keyue Bao * Matt Corallo * Orbital * Sergi Delgado Segura * Valentine Wallace * Willem Van Lint * Wilmer Paulino * benthecarman * jbesraa * olegkubrakov * optout * shaavan

TheBlueMatt added this to the 0.0.122 milestone Feb 13, 2024

coderabbitai bot reviewed Feb 13, 2024

View reviewed changes

tnull previously approved these changes Feb 14, 2024

View reviewed changes

wpaulino reviewed Feb 15, 2024

View reviewed changes

lightning/src/util/wakers.rs Show resolved Hide resolved

shaavan reviewed Feb 15, 2024

View reviewed changes

lightning/src/util/wakers.rs Outdated Show resolved Hide resolved

TheBlueMatt dismissed tnull’s stale review via 5a5a8d1 February 15, 2024 19:23

TheBlueMatt force-pushed the 2024-02-future-poll-leak branch from 60bff36 to 5a5a8d1 Compare February 15, 2024 19:23

coderabbitai bot reviewed Feb 15, 2024

View reviewed changes

TheBlueMatt added 3 commits February 15, 2024 21:52

TheBlueMatt force-pushed the 2024-02-future-poll-leak branch from 5a5a8d1 to 8157c01 Compare February 15, 2024 21:52

coderabbitai bot reviewed Feb 15, 2024

View reviewed changes

wpaulino approved these changes Feb 15, 2024

View reviewed changes

tnull approved these changes Feb 16, 2024

View reviewed changes

tnull merged commit e32020c into lightningdevkit:main Feb 16, 2024
11 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never store more than one StdWaker per live Future #2894

Never store more than one StdWaker per live Future #2894

TheBlueMatt commented Feb 13, 2024

coderabbitai bot commented Feb 13, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

coderabbitai bot left a comment

codecov-commenter commented Feb 13, 2024 •

edited

Loading

tnull left a comment

tnull Feb 14, 2024

TheBlueMatt Feb 15, 2024

wpaulino left a comment

shaavan left a comment

coderabbitai bot left a comment

wpaulino commented Feb 15, 2024

TheBlueMatt commented Feb 15, 2024

coderabbitai bot left a comment

Never store more than one StdWaker per live Future #2894

Never store more than one StdWaker per live Future #2894

Conversation

TheBlueMatt commented Feb 13, 2024

coderabbitai bot commented Feb 13, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 13, 2024 • edited Loading

Codecov Report

tnull left a comment

Choose a reason for hiding this comment

tnull Feb 14, 2024

Choose a reason for hiding this comment

TheBlueMatt Feb 15, 2024

Choose a reason for hiding this comment

wpaulino left a comment

Choose a reason for hiding this comment

shaavan left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

wpaulino commented Feb 15, 2024

TheBlueMatt commented Feb 15, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Feb 13, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

codecov-commenter commented Feb 13, 2024 •

edited

Loading