-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: Channel is stuck in pending state. #8028
Comments
Not sure if this is related or not, but I noticed this from the log in #8007:
This is a channel that on-chain has 50k confirmations for the force-close TX but I see that we try to re-publish the force close TX which gets refused:
|
Thanks @guggero and @ziggie1984 for your help in resolving this issue. If this gets implemented in any pre-release branch, I can test it in my current environment with pending channels and let you know the results. |
Based on description above there are few questions here,
lnd shouldn't relauch the resolvers if they are resolved onchain. So the sweep tx should still live in the mempool.
Since the tx is not confirmed, On the sweeper front, we store the sweep tx via |
lnd will restart the resolver as long as they are not confirmed so when restarting lnd those outpoints will be reoffered, maybe I did not 100% understand the question?
You can see in the logs in ##8007 that for example the Force-Close transaction is already confirmed but the height hint got updated and now lnd will not rescan/find the spend, so as guggero said something is poisoning our height_hint
Good Point I think we could do that, or maybe restructure the db so that we have kind of a map for an outpoint and its potential spending transaction. |
@guggero I think a quick fix for @sanjay-shah would be if we add a command in chantools to drop the spend and conf height hint caches, so that a restart would trigger the resolving of all channels ? Not a long term fix but still something which is done quickly ? |
Tried to analyse the problem why our HeightHint Cache is filled with garbage data but could not find anything interesting which could cause this behaviour. Normally the HeightHint is updated either if a historical rescan did not reveal the transaction in the blockchain or the transaction is confirmed. Tho my suggestion is to either reset the Spend/Conf Height Cache when starting up lnd with --reset-wallet-transactions=true, so that we can at least recover the funds if we run into this situation again. Currently we do only drop the transactions so I think it makes sense to include it. |
@sanjay-shah while fixing another bug I found the following setting already present in lnd:
|
@ziggie1984 I ran I did see one transaction being sent to mempool and its also confirmed, It was related expired htlc of one channel closed around one year ago: https://mempool.space/tx/0ce6fe4890977aa4e253e0c3eb2ef3288b5b7ca028eef906c98e5223348a3535 However my lnd has been crashing since I ran it with After this log I get a runtime error not logged inn log file: goroutine 4051 [running]: |
Try updating to lnd 17.0, this has a bugfix for this case. |
Ok will take a look, crashes also with 17.0. |
I do not remember exactly but why do we not also check for robustness (length of the witness) here, maybe you could help @yyforyongyu : https://github.com/lightningnetwork/lnd/blob/master/contractcourt/htlc_timeout_resolver.go#L377 |
Evaluating the situation his bitcoind node reveals the following tx information:
its missing all the witness data. my fullnode but also mempool.space shows this instead:
so we need to also add more robustness for the case where its a local commitment(non-taproot). |
Doesn't the lack of witness data indicate an issue with the |
Agree with you (we should definitely shutdown safely but shutdown), so his bitcoind is behaving very strange, he is updating now (was on 21), but he was not pruned or something interesting whether an update fixes the issue:
|
I think we've seen this before with a node but couldn't find the issue anymore. I think if the update doesn't work there might be a need to reindex the chain. Not sure if |
#7811 catches this case and gracefully shuts down |
Thanks @guggero , After updating to 25.1, I'm still seeing the same i.e. no witness data. I will just spin up a new v25.1 bitcoind full node and see if that fixes the problem. |
bitcoind debug.log while lnd crashed. |
With the bitcoin-core dev help of @maflcko we were able to identify the issue. When having the bitcoind config option set: According to maflcko this setting will be deprecated in the next release (26) but we will definitely need to highlight it in the docs or even check for this if that's possible. |
Aaah, that makes a lot of sense... Thank you so much @ziggie1984 for digging into this! Should we close this issue or do you want to link it to the PR that updates the docs and then close it when merging that? |
Yes let's wait until docs are updated. |
I am not sure whether we can close this issue tho? Because sanjay main problem was somehow that the height hint cache was updated although funds where not recovered. I looked into the sweeper code and whether it does somehow poison the height_hint cache but could not reproduce this behaviour. But I have the feeling that this setting Maybe close this issue and open a more narrowed one which describes the problem with the height hint cache ? |
Okay, closing this issue then. @ziggie1984 would you mind creating the issue for the height hint cache? It sounds to me like you might have the most up-to-date context on that. I did a quick scan over our existing issues and there doesn't seem to be one for it yet. |
Reported by several node runners, Channels which have all their contracts resolved onchain, still do have the channel stuck in the pending state.
The problem is based on the behaviour of the sweeper especially when a sweep is not resolved (pending in the mempool) and lnd restarts.
Example: With force-close leading to an Output (alias Sweeper-Inputs) which is registered with the sweeper engine. So far so good, lets say the fee is not sufficiient enough and the node-runner restarts the node after a while. Lnd does not remember the old sweeps. Now Lnd is relaunching the contract resolvers including the Output from the Force-Close. Very likely that lnd tries to publish the Sweep-Input again but its rejected by the mempool because we already have the Output swept before. The problem is now that LND will try to sweep this Input until
MaxSweepAttempts
is reached, removing the notification notifier for this input. Now lets imagine the old sweep tx is confirmed. Lnd will not be able to register this spent and the channel will stay in thePending
state forever (user can still abolish the channel but thats another topic).Problem lies here: https://github.com/lightningnetwork/lnd/blob/master/sweep/sweeper.go#L1373-L1382 => we basically remove the channel which notifies for a spent because we remove the input alltogether.
So we could fix this quick and dirty but I think this should be taken care of in the process of refactoring the sweeper wdyt @yyforyongyu. I think the best strategy is for bitcoind backends to check the input whether its already spent when registering the input with the sweeper. (gettxout rpc call from bitcoind - including spent in mempool). For other backends I still have to think about a solution.
I am wondering why the rescan when registering the input with the sweeper did not signal "already spent" when calling waitforspend maybe the look ahead is too short ?
The text was updated successfully, but these errors were encountered: