Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rebroadcast redundant closing TX upon restart #2001

Closed

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Feb 1, 2023

As of #1922, we don't rebroadcast the latest holder commitment transactions in ChannelMonitor::update_monitor() anymore if we had previously seen a closing tx on-chain. However, we still do so upon restarting / deserialization of ChannelManager.

With this change, we now also refrain from rebroadcasting upon restarting.

Let me know if there is a reason we always need to rebroadcast immediately.

While we don't rebroadcast the latest holder commitment transactions in
`ChannelMonitor::update_monitor()` anymore if we had previously seen a
closing tx on-chain, we still do so upon restarting / deserlization of
`ChannelManager`

With this change, we now also refrain from rebroadcasting upon
restarting.
@tnull tnull requested a review from wpaulino February 1, 2023 17:20
@tnull tnull changed the title Don't rebroadcast closing TX upon restart Don't rebroadcast redundant closing TX upon restart Feb 1, 2023
@BitcoinZavior
Copy link

Thanks @tnull this should resolve the issue when initialising a node from an existing config after closing the channel

@codecov-commenter
Copy link

Codecov Report

Base: 90.90% // Head: 90.88% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (b7462fd) compared to base (c59150a).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b7462fd differs from pull request most recent head 65a1cfa. Consider uploading reports for the commit 65a1cfa to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2001      +/-   ##
==========================================
- Coverage   90.90%   90.88%   -0.03%     
==========================================
  Files          99       99              
  Lines       52570    52576       +6     
  Branches    52570    52576       +6     
==========================================
- Hits        47789    47783       -6     
- Misses       4781     4793      +12     
Impacted Files Coverage Δ
lightning/src/chain/channelmonitor.rs 91.04% <100.00%> (+0.02%) ⬆️
lightning/src/ln/channelmanager.rs 87.28% <100.00%> (ø)
lightning/src/chain/onchaintx.rs 94.92% <0.00%> (-0.82%) ⬇️
lightning/src/ln/functional_tests.rs 96.95% <0.00%> (-0.13%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator

Because this broadcast is on init, its likely we have a stale view of the chain. What happens (and maybe its worth adding a test) if we start thinking a channel's funding output is spent (with 1 conf) but then after running for a second we see a reorg and now its not spent? We'd need to rebroadcast our local commitment tx in that case.

@TheBlueMatt
Copy link
Collaborator

Thanks @tnull this should resolve the issue when initialising a node from an existing config after closing the channel

I'm curious what this issue is - transactions may fail to broadcast for any number of reasons, it shouldn't be an error to see a transaction which fails to enter the mempool.

@tnull
Copy link
Contributor Author

tnull commented Feb 1, 2023

Because this broadcast is on init, its likely we have a stale view of the chain.

Agreed.

What happens (and maybe its worth adding a test) if we start thinking a channel's funding output is spent (with 1 conf) but then after running for a second we see a reorg and now its not spent? We'd need to rebroadcast our local commitment tx in that case.

We def. need to rebroadcast the commitment tx in case the original closing tx is reorged out, as even before #1922 our redundant broadcast would have failed. Relying on rebroadcasting on restart doesn't work, as it may be a long time before the node is restarted. I hope we do this right anyways, but currently fail to see where we do it. Will see to include a fix and test.

I'm curious what this issue is - transactions may fail to broadcast for any number of reasons, it shouldn't be an error to see a transaction which fails to enter the mempool.

Ah, when restarting LDK Node nodes we'd hit an unreachable codepath due to the runtime not being available when ChannelManager gets read. I now simply removed the panic as we currently ignore broadcast failures anyways, and we'll look into caching+rebroadcasting strategies soon.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 1, 2023

What happens (and maybe its worth adding a test) if we start thinking a channel's funding output is spent (with 1 conf) but then after running for a second we see a reorg and now its not spent? We'd need to rebroadcast our local commitment tx in that case.

Ah, good point. I guess here we'd only want to avoid the broadcast once we've seen a commitment onchain with >= 6 confs, i.e., once funding_spend_confirmed is set, since we assume reorgs will not be longer than that.

@tnull
Copy link
Contributor Author

tnull commented Feb 1, 2023

Ah, good point. I guess here we'd only want to avoid the broadcast once we've seen a commitment onchain with >= 6 confs, i.e., once funding_spend_confirmed is set, since we assume reorgs will not be longer than that.

Not sure why the restart case would be that different from #1922? After the restart we to sync to chain tip anyways and then need to react to any reorgs encountered?

@TheBlueMatt
Copy link
Collaborator

We def. need to rebroadcast the commitment tx in case the original closing tx is reorged out, as even before #1922 our redundant broadcast would have failed.

Wait, why would it have failed? The bitcoind we're sending to should be synced, hopefully, even if we aren't.

Relying on rebroadcasting on restart doesn't work, as it may be a long time before the node is restarted. I hope we do this right anyways, but currently fail to see where we do it. Will see to include a fix and test.

Yea, that's true. we should handle this explicitly, no complaints here.

Ah, when restarting LDK Node nodes we'd hit an unreachable codepath due to the runtime not being available when ChannelManager gets read. I now simply removed the panic as we currently ignore broadcast failures anyways, and we'll look into caching+rebroadcasting strategies soon.

Heh, right, can't do that :). I guess if we fix this properly we can change ChannelManager to not broadcast during read, which would be nice for issues like this.

@tnull
Copy link
Contributor Author

tnull commented Feb 1, 2023

Wait, why would it have failed? The bitcoind we're sending to should be synced, hopefully, even if we aren't.

For one broadcast to everything besides a local bitcoind instance's mempool could be pretty unreliable. My main point however was that any redundant broadcasts would just fail, and broadcasting once more on restart really doesn' t protect us at all from a reorg happening just after having read ChannelManager: in both cases (avoiding redundancy or just having it fail), we'd need to be able to react to reorgs ASAP.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 1, 2023

Ah, good point. I guess here we'd only want to avoid the broadcast once we've seen a commitment onchain with >= 6 confs, i.e., once funding_spend_confirmed is set, since we assume reorgs will not be longer than that.

Not sure why the restart case would be that different from #1922? After the restart we to sync to chain tip anyways and then need to react to any reorgs encountered?

Just noting we could avoid the broadcast on restart if the commitment is confirmed past our expectation for reorgs. However, I agree we should avoid broadcasting at all during restart and should only broadcast if it's not confirmed anymore after we detect a reorg and we're synced.

@TheBlueMatt
Copy link
Collaborator

So it sounds like we're all on the same page. IMO we should hold this until we can remove the broadcast-during-deser entirely with a complete fix on the monitor side.

@tnull
Copy link
Contributor Author

tnull commented Feb 2, 2023

So it sounds like we're all on the same page. IMO we should hold this until we can remove the broadcast-during-deser entirely with a complete fix on the monitor side.

After having a look at the code and speaking to @wpaulino offline I was under the impression that our current rebroadcasting logic in OnchainTxHandler should handle this just fine, as it would do the right thing on block_{connected, disconnected}, i.e., that the main thing missing here would be to add a test case that asserts this. Any thoughts on that?

@TheBlueMatt
Copy link
Collaborator

After having a look at the code and speaking to @wpaulino offline I was under the impression that our current rebroadcasting logic in OnchainTxHandler should handle this just fine

Hmm, we have a bunch of logic under cfg(anchors) to bump commitment txn, but I don't immediately see where we rebroadcast commitment txn for channels that are closed? If we do, we should just remove the during-deser broadcast entirely, replacing it with a close after we get going.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 6, 2023

Hmm, we have a bunch of logic under cfg(anchors) to bump commitment txn, but I don't immediately see where we rebroadcast commitment txn for channels that are closed?

When generating the claim for an untractable package, we just broadcast the transaction as shown below:

return Some((None, 0, OnchainClaim::Tx(tx)));

broadcaster.broadcast_transaction(&tx);

broadcaster.broadcast_transaction(&bump_tx);

broadcaster.broadcast_transaction(&bump_tx);

@TheBlueMatt
Copy link
Collaborator

Ah, okay, sorry, I had missed where we do HolderFundingOutput packages. So ISTM we shouldn't be changing the logic to broadcast_if_unspent here, but instead just generate a ChannelMonitorUpdateStep::ChannelForceClosed for processing later.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 6, 2023

The channel's already closed at this point though, so hasn't one already been generated?

@tnull
Copy link
Contributor Author

tnull commented Feb 6, 2023

Ah, okay, sorry, I had missed where we do HolderFundingOutput packages. So ISTM we shouldn't be changing the logic to broadcast_if_unspent here, but instead just generate a ChannelMonitorUpdateStep::ChannelForceClosed for processing later.

Mh, that def. makes more sense to me, but don't we still only want to create a redundant ChannelForceClosed if we hadn't seen a closing on chain?

Nvm, checking that it's not redundant would happen anyways in this case. So question would indeed be if we really need to regenerate the update step at all?

@TheBlueMatt
Copy link
Collaborator

The channel's already closed at this point though, so hasn't one already been generated?

I'm not sure? The point of this check is to be a belt-and-suspenders check that the monitor knows the channel is closed if the manager thinks it is. I'm not at all confident there isn't some race case on write order that would break that.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 6, 2023

After reviewing #1897, I see there's a chance of this happening if the ChannelManager is persisted, but the ChannelMonitorUpdate isn't and we crash. As of that PR and it's follow-up work, we'll queue and persist the ChannelMonitorUpdates within each Channel, so once we apply those updates we should be safe then?

@TheBlueMatt
Copy link
Collaborator

I don't think so? We could have a channel be created, persist the monitor, update the channel once or twice, force-close the channel, fail to update the monitor, and crash, all without having persisted the ChannelManager. In that case we probably still need this to work.

@tnull
Copy link
Contributor Author

tnull commented Mar 7, 2023

Closing this as superseded by #2059.

@tnull tnull closed this Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants