[bug]: Channel is stuck in pending state. #8028

ziggie1984 · 2023-09-24T13:32:35Z

Reported by several node runners, Channels which have all their contracts resolved onchain, still do have the channel stuck in the pending state.

The problem is based on the behaviour of the sweeper especially when a sweep is not resolved (pending in the mempool) and lnd restarts.

Example: With force-close leading to an Output (alias Sweeper-Inputs) which is registered with the sweeper engine. So far so good, lets say the fee is not sufficiient enough and the node-runner restarts the node after a while. Lnd does not remember the old sweeps. Now Lnd is relaunching the contract resolvers including the Output from the Force-Close. Very likely that lnd tries to publish the Sweep-Input again but its rejected by the mempool because we already have the Output swept before. The problem is now that LND will try to sweep this Input until MaxSweepAttempts is reached, removing the notification notifier for this input. Now lets imagine the old sweep tx is confirmed. Lnd will not be able to register this spent and the channel will stay in the Pending state forever (user can still abolish the channel but thats another topic).

Problem lies here: https://github.com/lightningnetwork/lnd/blob/master/sweep/sweeper.go#L1373-L1382 => we basically remove the channel which notifies for a spent because we remove the input alltogether.

So we could fix this quick and dirty but I think this should be taken care of in the process of refactoring the sweeper wdyt @yyforyongyu. I think the best strategy is for bitcoind backends to check the input whether its already spent when registering the input with the sweeper. (gettxout rpc call from bitcoind - including spent in mempool). For other backends I still have to think about a solution.

I am wondering why the rescan when registering the input with the sweeper did not signal "already spent" when calling waitforspend maybe the look ahead is too short ?

The text was updated successfully, but these errors were encountered:

guggero · 2023-09-26T09:11:03Z

Not sure if this is related or not, but I noticed this from the log in #8007:

Using height hint 809349 retrieved from cache for outpoint=7e26f401a44a5398529032ac50e9a6182a5719fdc0f5630451c2aa7c6c49d585:0, script=0 6eec80ba4a7d32058575c31c24a29946ca03a46c4829be5801c6f5a6b88ae7bd instead of 724383 for spend subscription

This is a channel that on-chain has 50k confirmations for the force-close TX but lnd still sees it as pending. Could it be that our own sweep TX "poisons" our height hint cache and causes us to miss spend notifications?

I see that we try to re-publish the force close TX which gets refused:

2023-09-25 16:47:31.217 [INF] CNCT: Re-publishing force close tx(286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1) for channel 7e26f401a44a5398529032ac50e9a6182a5719fdc0f5630451c2aa7c6c49d585:0
...
2023-09-25 16:47:31.232 [INF] LNWL: Inserting unconfirmed transaction 286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1
2023-09-25 16:47:31.304 [INF] LNWL: 286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1: tx already confirmed
...
2023-09-25 16:47:31.435 [INF] NTFN: New confirmation subscription: conf_id=2, txid=286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1, num_confs=6 height_hint=809351
..
2023-09-25 16:47:31.449 [DBG] NTFN: Dispatching historical confirmation rescan for txid=286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1

sanjay-shah · 2023-09-26T10:18:22Z

Thanks @guggero and @ziggie1984 for your help in resolving this issue. If this gets implemented in any pre-release branch, I can test it in my current environment with pending channels and let you know the results.

yyforyongyu · 2023-09-26T13:57:13Z

Based on description above there are few questions here,

Now Lnd is relaunching the contract resolvers including the Output from the Force-Close.

lnd shouldn't relauch the resolvers if they are resolved onchain. So the sweep tx should still live in the mempool.

I am wondering why the rescan when registering the input with the sweeper did not signal "already spent" when calling waitforspend maybe the look ahead is too short ?

Since the tx is not confirmed, waitForSpend won't return. I think the look ahead we use here is the broadcast height so it should be fine? Do you have logs showing the output that's causing the issue?

On the sweeper front, we store the sweep tx via NotifyPublishTx so I think we should leverage the sweeper store during startup to reconstruct the pending inputs state.

ziggie1984 · 2023-09-26T15:33:09Z

lnd shouldn't relauch the resolvers if they are resolved onchain. So the sweep tx should still live in the mempool.

lnd will restart the resolver as long as they are not confirmed so when restarting lnd those outpoints will be reoffered, maybe I did not 100% understand the question?

Since the tx is not confirmed, waitForSpend won't return. I think the look ahead we use here is the broadcast height so it should be fine? Do you have logs showing the output that's causing the issue?

You can see in the logs in ##8007 that for example the Force-Close transaction is already confirmed but the height hint got updated and now lnd will not rescan/find the spend, so as guggero said something is poisoning our height_hint

On the sweeper front, we store the sweep tx via NotifyPublishTx so I think we should leverage the sweeper store during startup to reconstruct the pending inputs state.

Good Point I think we could do that, or maybe restructure the db so that we have kind of a map for an outpoint and its potential spending transaction.

ziggie1984 · 2023-09-26T15:34:11Z

@guggero I think a quick fix for @sanjay-shah would be if we add a command in chantools to drop the spend and conf height hint caches, so that a restart would trigger the resolving of all channels ? Not a long term fix but still something which is done quickly ?

ziggie1984 · 2023-09-28T09:27:34Z

Tried to analyse the problem why our HeightHint Cache is filled with garbage data but could not find anything interesting which could cause this behaviour. Normally the HeightHint is updated either if a historical rescan did not reveal the transaction in the blockchain or the transaction is confirmed.

Tho my suggestion is to either reset the Spend/Conf Height Cache when starting up lnd with --reset-wallet-transactions=true, so that we can at least recover the funds if we run into this situation again. Currently we do only drop the transactions so I think it makes sense to include it.

ziggie1984 · 2023-10-24T08:48:38Z

@sanjay-shah while fixing another bug I found the following setting already present in lnd:

--height-hint-cache-query-disable, try starting lnd with this setting, and your funds should be recovered.

sanjay-shah · 2023-10-25T07:38:50Z

@ziggie1984 I ran lnd --height-hint-cache-query-disable.

I did see one transaction being sent to mempool and its also confirmed, It was related expired htlc of one channel closed around one year ago: https://mempool.space/tx/0ce6fe4890977aa4e253e0c3eb2ef3288b5b7ca028eef906c98e5223348a3535

However my lnd has been crashing since I ran it with --height-hint-cache-query-disable option.
Now I can't even run lnd normally. Please see the attached lnd.log And please help urgently as we were scheduled to release our app tomorrow :(

lnd.log

After this log I get a runtime error not logged inn log file:
panic: runtime error: index out of range [1] with length 0

goroutine 4051 [running]:
github.com/lightningnetwork/lnd/contractcourt.isPreimageSpend(...)
github.com/lightningnetwork/lnd/contractcourt/htlc_timeout_resolver.go:255
github.com/lightningnetwork/lnd/contractcourt.(*htlcTimeoutResolver).Resolve(0xc002c6b000)
github.com/lightningnetwork/lnd/contractcourt/htlc_timeout_resolver.go:281 +0x2a9
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).resolveContract(0xc001da0840, {0x2258fe0?, 0xc002c6b000})
github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:2371 +0x2a9
created by github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).launchResolvers
github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:1376 +0xd2

ziggie1984 · 2023-10-25T08:03:05Z

Try updating to lnd 17.0, this has a bugfix for this case.

ziggie1984 · 2023-10-25T08:19:47Z

Ok will take a look, crashes also with 17.0.

ziggie1984 · 2023-10-25T08:24:01Z

I do not remember exactly but why do we not also check for robustness (length of the witness) here, maybe you could help @yyforyongyu : https://github.com/lightningnetwork/lnd/blob/master/contractcourt/htlc_timeout_resolver.go#L377

ziggie1984 · 2023-10-25T08:33:33Z

Evaluating the situation his bitcoind node reveals the following tx information:

bitcoin-cli getrawtransaction 0ce6fe4890977aa4e253e0c3eb2ef3288b5b7ca028eef906c98e5223348a3535
0200000002a72b35f9bcd46bb78c0cf63c6cd2bdc467044d4837ea27f8b905b51d20a6b2be020000000001000000f3f51c7750cbb11033e3b61e6de295ba6f1f0846fd256038e4dcb3c5cf26889f00000000000000000002127a000000000000220020ead078680d77a590ac3f065ce64ce8d9dbe4462b857857fe39d2f6df5ddf7c61410e30000000000022512089a3a0047f3733310a43fa77c55acd855bcec63040d090a055a28ef3377e8acedf960b00

its missing all the witness data.

my fullnode but also mempool.space shows this instead:

02000000000102a72b35f9bcd46bb78c0cf63c6cd2bdc467044d4837ea27f8b905b51d20a6b2be020000000001000000f3f51c7750cbb11033e3b61e6de295ba6f1f0846fd256038e4dcb3c5cf26889f00000000000000000002127a000000000000220020ead078680d77a590ac3f065ce64ce8d9dbe4462b857857fe39d2f6df5ddf7c61410e30000000000022512089a3a0047f3733310a43fa77c55acd855bcec63040d090a055a28ef3377e8ace0500483045022100a1fe9f370ac1fde977ec6a147d8e32f07c8b6f6548667bcc0f6cf5b6a10e4dde02202484ab1a9ec46a075c209c3f1127948e0baa26137b0501a6a51df64ba1cfebbd8347304402204a46f33d8fbe7e51ce462d0099caf66fa2c4a94128d7f0e5d4828de25d13b7f002205ab9a9f8d997033e500ce81bdf6ca21a07262311517266f61e6e452992e91ff401008876a9145a9526206556cafdda02fc05c1da0aa49d2f17bb8763ac67210332bdfbea33e7be2d3d8873b144b401c02aedb13ee2211138406c4d574dc4685c7c820120876475527c210334888d0231807688c2cdb942fb61605b460789605c734539e7d749b652527bbf52ae67a91453d453c8befad357c7de27a98858165e2b7a152388ac6851b27568014011de1eab981fc10d5efcb0543890b9b80851eeba7627226601d2a2ad9675542abad51eb8b1f22b8140eb9db9eccae9a0190e17fb9656b14b8cb7c0c86e98be85df960b00

so we need to also add more robustness for the case where its a local commitment(non-taproot).

guggero · 2023-10-25T08:41:16Z

Doesn't the lack of witness data indicate an issue with the bitcoind node itself? I agree we shouldn't panic. But this would still warrant a critical (meaning it would trigger a safe shutdown of lnd) error.

ziggie1984 · 2023-10-25T08:46:10Z

Agree with you (we should definitely shutdown safely but shutdown), so his bitcoind is behaving very strange, he is updating now (was on 21), but he was not pruned or something interesting whether an update fixes the issue:

bitcoin-cli getblockchaininfo
{
  "chain": "main",
  "blocks": 813751,
  "headers": 813751,
  "bestblockhash": "00000000000000000002240d0ee6598bae540d6c2a2552967cf3a586fa9edc26",
  "difficulty": 61030681983175.59,
  "mediantime": 1698220005,
  "verificationprogress": 0.9999996602748938,
  "initialblockdownload": false,
  "chainwork": "00000000000000000000000000000000000000005a5a43f64d82a9739fd4cf18",
  "size_on_disk": 591392393331,
  "pruned": false,
  "softforks": {
    "bip34": {
      "type": "buried",
      "active": true,
      "height": 227931
    },
    "bip66": {
      "type": "buried",
      "active": true,
      "height": 363725
    },
    "bip65": {
      "type": "buried",
      "active": true,
      "height": 388381
    },
    "csv": {
      "type": "buried",
      "active": true,
      "height": 419328
    },
    "segwit": {
      "type": "buried",
      "active": true,
      "height": 481824
    },
    "taproot": {
      "type": "bip9",
      "bip9": {
        "status": "active",
        "start_time": 1619222400,
        "timeout": 1628640000,
        "since": 709632,
        "min_activation_height": 709632
      },
      "height": 709632,
      "active": true
    }
  },
  "warnings": ""
}

guggero · 2023-10-25T10:35:53Z

I think we've seen this before with a node but couldn't find the issue anymore. I think if the update doesn't work there might be a need to reindex the chain. Not sure if -reindex or -reindex-chainstate is enough, worst case a full reset and starting from an empty data directory for bitcoind might be needed.

yyforyongyu · 2023-10-25T11:20:58Z

#7811 catches this case and gracefully shuts down lnd as it's been reported before occasionally bitcoind won't return the witness.

sanjay-shah · 2023-10-25T22:41:38Z

I think we've seen this before with a node but couldn't find the issue anymore. I think if the update doesn't work there might be a need to reindex the chain. Not sure if -reindex or -reindex-chainstate is enough, worst case a full reset and starting from an empty data directory for bitcoind might be needed.

Thanks @guggero , After updating to 25.1, I'm still seeing the same i.e. no witness data.

I will just spin up a new v25.1 bitcoind full node and see if that fixes the problem.

sanjay-shah · 2023-10-26T07:17:28Z

bitcoind debug.log while lnd crashed.

debug.log

ziggie1984 · 2023-10-26T09:05:02Z

With the bitcoin-core dev help of @maflcko we were able to identify the issue.

When having the bitcoind config option set: rpcserialversion=0, the rpc command will only provide the non-segwit serialization of the transaction, which leads to the above behaviour (no witness data).

According to maflcko this setting will be deprecated in the next release (26) but we will definitely need to highlight it in the docs or even check for this if that's possible.

guggero · 2023-10-26T12:11:29Z

Aaah, that makes a lot of sense... Thank you so much @ziggie1984 for digging into this! Should we close this issue or do you want to link it to the PR that updates the docs and then close it when merging that?

ziggie1984 · 2023-10-26T13:07:21Z

Yes let's wait until docs are updated.

ziggie1984 · 2023-10-27T07:01:08Z

I am not sure whether we can close this issue tho? Because sanjay main problem was somehow that the height hint cache was updated although funds where not recovered. I looked into the sweeper code and whether it does somehow poison the height_hint cache but could not reproduce this behaviour. But I have the feeling that this setting --height-hint-cache-query-disable was introduced because of such a behaviour in the past.

Maybe close this issue and open a more narrowed one which describes the problem with the height hint cache ?

guggero · 2023-10-27T09:36:28Z

Okay, closing this issue then. @ziggie1984 would you mind creating the issue for the height hint cache? It sounds to me like you might have the most up-to-date context on that. I did a quick scan over our existing issues and there doesn't seem to be one for it yet.

ziggie1984 added bug Unintended code behaviour needs triage labels Sep 24, 2023

bitromortac added recovery Related to the backup/restoration of LND data (e.g. wallet seeds) bug Unintended code behaviour needs triage force closes and removed bug Unintended code behaviour needs triage recovery Related to the backup/restoration of LND data (e.g. wallet seeds) labels Sep 25, 2023

saubyk added P2 should be fixed if one has time and removed needs triage labels Sep 26, 2023

yyforyongyu added the utxo sweeping label Sep 28, 2023

yyforyongyu mentioned this issue Sep 28, 2023

[epic] sweeper: improve sweeper with defined responsibility and functionalities #8042

Closed

ziggie1984 mentioned this issue Oct 2, 2023

add new command to clean height hint cache. lightninglabs/chantools#80

Draft

ziggie1984 mentioned this issue Oct 25, 2023

Empty Witness Data, although transaction has witness populated in other full-nodes bitcoin/bitcoin#28730

Closed

1 task

ziggie1984 mentioned this issue Oct 27, 2023

docs: Add requirement for bitcoind. #8119

Merged

guggero closed this as completed Oct 27, 2023

ziggie1984 mentioned this issue Oct 27, 2023

[bug]: Spend-Output monitoring might have a rare edge case where it misses the spend of an output. #8120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: Channel is stuck in pending state. #8028

[bug]: Channel is stuck in pending state. #8028

ziggie1984 commented Sep 24, 2023

guggero commented Sep 26, 2023

sanjay-shah commented Sep 26, 2023

yyforyongyu commented Sep 26, 2023

ziggie1984 commented Sep 26, 2023

ziggie1984 commented Sep 26, 2023

ziggie1984 commented Sep 28, 2023

ziggie1984 commented Oct 24, 2023

sanjay-shah commented Oct 25, 2023 •

edited

Loading

ziggie1984 commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023 •

edited

Loading

ziggie1984 commented Oct 25, 2023

guggero commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023

guggero commented Oct 25, 2023

yyforyongyu commented Oct 25, 2023

sanjay-shah commented Oct 25, 2023

sanjay-shah commented Oct 26, 2023

ziggie1984 commented Oct 26, 2023

guggero commented Oct 26, 2023

ziggie1984 commented Oct 26, 2023

ziggie1984 commented Oct 27, 2023

guggero commented Oct 27, 2023

[bug]: Channel is stuck in pending state. #8028

[bug]: Channel is stuck in pending state. #8028

Comments

ziggie1984 commented Sep 24, 2023

guggero commented Sep 26, 2023

sanjay-shah commented Sep 26, 2023

yyforyongyu commented Sep 26, 2023

ziggie1984 commented Sep 26, 2023

ziggie1984 commented Sep 26, 2023

ziggie1984 commented Sep 28, 2023

ziggie1984 commented Oct 24, 2023

sanjay-shah commented Oct 25, 2023 • edited Loading

ziggie1984 commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023 • edited Loading

ziggie1984 commented Oct 25, 2023

guggero commented Oct 25, 2023

ziggie1984 commented Oct 25, 2023

guggero commented Oct 25, 2023

yyforyongyu commented Oct 25, 2023

sanjay-shah commented Oct 25, 2023

sanjay-shah commented Oct 26, 2023

ziggie1984 commented Oct 26, 2023

guggero commented Oct 26, 2023

ziggie1984 commented Oct 26, 2023

ziggie1984 commented Oct 27, 2023

guggero commented Oct 27, 2023

sanjay-shah commented Oct 25, 2023 •

edited

Loading

ziggie1984 commented Oct 25, 2023 •

edited

Loading