Support async `ChannelMonitorUpdate` persist for claims against closed channels #3414

TheBlueMatt · 2024-11-18T20:00:13Z

Based on #3413, this finally implements async ChannelMonitorUpdate persistence against closed channels. After all the prep, its...not all that bad. Would still like to write an additional test for the last second-to-commit though the changes in that commit do show it has some coverage.

Makes `test_durable_preimages_on_closed_channel` more robust against changes to the order in which transactions are broadcast.

When deciding if we should remove a `PeerState` entry we want to ensure we don't remove if there are pending updates in `in_flight_monitor_updates`. Previously this was done with a simple `in_flight_monitor_updates.is_empty()`, however this can prevent removal of `PeerState` entries if a channel had an update at some point (leaving an entry in the map) but the update was ultimately completed. Instead, we need to iterate over the entries in `in_flight_monitor_updates` and decline to remove `PeerState`s only if there is an entry for a pending update still in-flight.

On startup, if we have a channel which was closed immediately before shutdown such that the `ChannelMonitorUpdate` marking the channel as closed is still in-flight, it doesn't make sense to generate a fresh `ChannelMonitorUpdate` marking the channel as closed immediately after the existing in-flight one. Here we detect this case and drop the extra update, though its not all that harmful it does avoid some test changes in the coming commits.

During block connection, we cannot apply `ChannelMonitorUpdate`s if we're running during the startup sequence (i.e. before the user has called any methods outside of block connection). We previously handled this by simply always pushing any `ChannelMonitorUpdate`s generated during block connection into the `pending_background_events` queue. However, this results in `ChannelMonitorUpdate`s going through the queue when we could just push them immediately. Here we explicitly check `background_events_processed_since_startup` and use that to decide whether to push updates through the background queue instead.

In the coming commits we'll start handling `ChannelMonitorUpdate`s during channel closure in-line rather than after dropping locks via `finish_close_channel`. In order to make that easy, here we add a new `REMAIN_LOCKED_UPDATE_ACTIONS_PROCESSED_LATER` variant to `handle_new_monitor_update!` which can attempt to apply an update without dropping the locks and processing `MonitorUpdateCompletionAction`s immediately.

Closing channels requires a two step process - first `update_maps_on_chan_removal` is called while holding the same per-peer lock under which the channel reached the terminal state, then after dropping the same lock(s), `finish_close_channel` is called. Because the channel is closed and thus no further `ChannelMonitorUpdate`s are generated for the off-chain state, we'd previously applied the `ChannelMonitorUpdate` in `finish_close_channel`. This was tweaked somewhat in c99d3d7 when we stopped using `u64::MAX` for any updates after closure. However, we worked around the races that implied by setting the `update_id` only when we go to apply the `ChannelMonitorUpdate`, rather than when we create it. In a coming commit, we'll need to have an `update_id` immediately upon creation (to track in-flight updates that haven't reached application yet). This implies that we can no longer apply closure `ChannelMonitorUpdate`s after dropping the per-peer lock(s), as the updates must be well-ordered with any later updates to the same channel, even after it has been closed. Thus, here, we add `ChannelMonitorUpdate` handling to `update_maps_on_chan_removal`, renaming it `locked_close_channel` to better capture its new purpose.

c99d3d7 updated `ChannelMonitorUpdate::update_id` to continue counting up even after the channel is closed. It, however, accidentally updated the `ChannelMonitorUpdate` application logic to skip testing that `ChannelMonitorUpdate`s are well-ordered after the channel has been closed (in an attempt to ensure other checks in the same conditional block were applied). This fixes that oversight.

When we handle a `ChannelMonitorUpdate` completion we always complete everything that was waiting on any updates to the same channel all at once. Thus, we need to skip all updates if there's pending updates besides the one that was just completed. We handled this correctly for open channels, but the shortcut for closed channels ignored any other pending updates entirely. Here we fix this, which is ultimately required for tests which are added in a few commits to pass.

Now that we track the latest `ChannelMonitorUpdate::update_id` for each closed channel in `PeerState::closed_channel_monitor_update_ids`, we should always have a `PeerState` entry for the channel counterparty any time we go to claim an HTLC on a channel, even if its closed. Here we make this a hard assertion as we'll need to access that `PeerState` in the coming commits to track in-flight updates against closed channels.

In c99d3d7 we added a new `apply_post_close_monitor_update` method which takes a `ChannelMonitorUpdate` (possibly) for a channel which has been closed, sets the `update_id` to the right value to keep our updates well-ordered, and then applies it. Setting the `update_id` at application time here is fine - updates don't really have an order after the channel has been closed, they can be applied in any order - and was done for practical reasons as calculating the right `update_id` at generation time takes a bit more work on startup, and was impossible without new assumptions during claim. In the previous commit we added exactly the new assumption we need at claiming (as it's required for the next few commits anyway), so now the only thing stopping us is the extra complexity. In the coming commits, we'll move to tracking post-close `ChannelMonitorUpdate`s as in-flight like any other updates, which requires having an `update_id` at generation-time so that we know what updates are still in-flight. Thus, we go ahead and eat the complexity here, creating `update_id`s when the `ChannelMonitorUpdate`s are generated for closed-channel updates, like we do for channels which are still live.

In the coming commits we'll start handling `ChannelMonitorUpdate`s during channel closure in-line rather than after dropping locks via `finish_close_channel`. In order to make that easy, here we add a new `REMAIN_LOCKED_UPDATE_ACTIONS_PROCESSED_LATER` variant to `handle_new_monitor_update!` which can attempt to apply an update without dropping the locks and processing `MonitorUpdateCompletionAction`s immediately. In the coming commits we'll start handling `ChannelMonitorUpdate`s "like normal" for updates against closed channels. Here we set up the first step by adding a new `POST_CHANNEL_CLOSE` variant on `handle_new_monitor_update!` which attempts to handle the `ChannelMonitorUpdate` and handles completion actions if it finishes immediately, just like the pre-close variant.

One of the largest gaps in our async persistence functionality has been preimage (claim) updates to closed channels. Here we finally implement support for this (for updates at runtime). Thanks to all the work we've built up over the past many commits, this is a well-contained patch within `claim_mpp_part`, pushing the generated `ChannelMonitorUpdate`s through the same pipeline we use for open channels. Sadly we can't use the `handle_new_monitor_update` macro wholesale as it handles the `Channel` resumption as well which we don't do here.

On startup, we walk the preimages and payment HTLC sets on all our `ChannelMonitor`s, re-claiming all payments which we recently claimed. This ensures all HTLCs in any claimed payments are claimed across all channels. In doing so, we expect to see the same payment multiple times, after all it may have been received as multiple HTLCs across multiple channels. In such cases, there's no reason to redundantly claim the same set of HTLCs again and again. In the current code, doing so may lead to redundant `PaymentClaimed` events, and in a coming commit will instead cause an assertion failure.

One of the largest gaps in our async persistence functionality has been preimage (claim) updates to closed channels. Here we finally implement support for this (for updates which are generated during startup). Thanks to all the work we've built up over the past many commits, this is a fairly straightforward patch, removing the immediate-completion logic from `claim_mpp_part` and adding the required in-flight tracking logic to `apply_post_close_monitor_update`. Like in the during-runtime case in the previous commit, we sadly can't use the `handle_new_monitor_update` macro wholesale as it handles the `Channel` resumption as well which we don't do here.

This moves the common `if during_startup { push background event } else { apply ChannelMonitorUpdate }` pattern by simply inlining it in `handle_new_monitor_update`. It also ensures we always insert `ChannelMonitorUpdate`s in the pending updates set when we push the background event, avoiding a race where we push an update as a background event, then while its processing another update finishes and the post-update actions get run.

codecov · 2024-11-18T20:08:53Z

Codecov Report

Attention: Patch coverage is 90.05102% with 39 lines in your changes missing coverage. Please review.

Project coverage is 89.60%. Comparing base (8da30df) to head (8fca379).

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	86.36%	21 Missing and 15 partials ⚠️
lightning/src/ln/chanmon_update_fail_tests.rs	98.13%	1 Missing and 1 partial ⚠️
lightning/src/chain/channelmonitor.rs	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3414      +/-   ##
==========================================
- Coverage   89.60%   89.60%   -0.01%     
==========================================
  Files         129      129              
  Lines      105506   105646     +140     
  Branches   105506   105646     +140     
==========================================
+ Hits        94540    94664     +124     
- Misses       8212     8220       +8     
- Partials     2754     2762       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

TheBlueMatt added 15 commits November 18, 2024 17:36

Make test_durable_preimages_on_closed_channel more robust

4744cce

Makes `test_durable_preimages_on_closed_channel` more robust against changes to the order in which transactions are broadcast.

TheBlueMatt added this to the 0.1 milestone Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support async `ChannelMonitorUpdate` persist for claims against closed channels #3414

Support async `ChannelMonitorUpdate` persist for claims against closed channels #3414

TheBlueMatt commented Nov 18, 2024

codecov bot commented Nov 18, 2024 •

edited

Loading

Support async ChannelMonitorUpdate persist for claims against closed channels #3414

Are you sure you want to change the base?

Support async ChannelMonitorUpdate persist for claims against closed channels #3414

Conversation

TheBlueMatt commented Nov 18, 2024

codecov bot commented Nov 18, 2024 • edited Loading

Codecov Report

Support async `ChannelMonitorUpdate` persist for claims against closed channels #3414

Support async `ChannelMonitorUpdate` persist for claims against closed channels #3414

codecov bot commented Nov 18, 2024 •

edited

Loading