-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed verifying HTLC signatures: UpdateAddHtlc - infinity loop #8191
Comments
Thank you for the report. Do we know what lightning node implementation (and ideally version) the counterparty ( I've filtered your log lines for the interesting bits and added
It is interesting that the remote does not send a commit sig after we send I have a hunch that the commit sig being delayed and covering multiple updates is related to the sig mismatch. I've tried to reproduce this behaviour with lnd, using I've also tried to reproduce with c-lightning, using I can't find a comparable option for eclair. In any case, I wonder how it can happen that the remote does not send a commit sig for our fee update for 16 minutes... Would they have ever sent a commit sig had it not been for the update_add_htlc... |
Any chance you have logs for a session after you restarted electrum? |
From start until two crashes, then it repeats until force-close (at the end)
|
I am programmer, but my knowledge about how LN works is only at level "user experience". So I can only guess what happened. The main goal was to open a channel to the well known public node - Bitfinex bfx-lnd1 (https://ln.bitfinex.com/), then to swap out the liquidity back to the chain (through fixedfloat) That 15 minutes inactivity happened, because I opened channel manually and during waiting for 6 confirmations I did something else, so I did not return in exact time - as there is no notification. The very next action was to create LN invoice and let the invoice to be paid. The channel starts to switch between OPENED and DISCONNECTED repeatedly. I might do something wrong because my laziness, so during opening channel, i reused channel ID from previously closed channel to the same node , instead of putting official ID from their page - the name of the node was resolved correctly. It is also possible, that problem is at Bitfinex's side. Because this can happen, things often don't work correctly, I just was surprised by reaction of the Electrum. Unstoppable payment, locked assets, no available action except "force close" - which results in freeze of the assets for over 2000 blocks. Other solution would be to ask the counterparty to force close his side of channel, as there were no assets at their side, but this options was unavailable (because during DISCONNECTED, you have only one option FORCE CLOSE) |
I guess you mean you copied the node ID. That is perfectly fine.
I notice that you opened the channel for max satoshis, and tried to send a payment that is practically the full channel capacity. Have you tried this before with the bitfinex node, with another channel you had with that node? (or even another node) |
i am choosing BFX node because it allows much bigger channels then recommended maximum. (and has good capacity) This constrain should be removed in the Electrum (as i remember, but I am still mostly using recommended maximum). However I have currently a lot of assets allocated outside of this wallet, and my current balance is low - add fact that whole capacity is currently frozen for 14 days. So I did not try to repeat the attempt. I can try with smaller amount (later today). It is still possible that this is a rare bug which is hard to reproduce. What about failed internet connection?, lost packet, etc? |
Right, there is already an issue about removing the legacy max channel capacity: #8165
I meant if you perhaps tried this already in the past (the large amounts - as in, (close to) max capacity) with this node. (especially as you said you already had a channel in this node before)
If you (or someone else) do(es) end up reproducing the bug, if possible, please do not force-close the channel, just write here, and then we can try modifying the code a bit, mainly to log more stuff. (but I am considering adding more logging anyway)
Yes, indeed. I suspect it is not easy to reproduce. :/
There was no reconnection to the counterparty in your original log. The network connection uses TCP, so packets couldn't really have gotten lost or reordered either. Btw thanks for the second log. Everything looks as expected there (so all the interesting bits are in the original log). |
OK, I tried to open a channel with the same node (bfx-lnd1), but I could open just for small amount (0.05 BTC). And this time, no problem happened. Do you want to see a log? I will continue to use it as usual and if there will be similar issue, I will not force to close the channel, instead I will reopen issue here. (I am also trade on OTC exchanges through the LN network where it is supported, because it is fast, no longer need to wait for confirmations when the rate is optimal for the trade, so I often open channel for a single exchange) |
Yes, please. I would like to see whether they delay signing a commitment after we send update_fee. |
The first testing payment was from one my wallet to my other wallet, so you can probably see two channels in the log. Other channel is connected to a different node. The important channel is being FUNDED (9688946346...) and leads to bfx-lnd1 (03cde60a63...) (other was already opened for a long time)
|
I just compared these logs (by my eyes) and found, that the first log doesn't contain event "on_funding_locked". So I opened original version of the log and I was able to find this event before the channel is switched to FUNDED state: The message FUNDING_LOCKED is received while the Electrum still don't see sufficient depth. Can be this an issue?
|
AUTOMATIC FORCE CLOSE!!!!
My suggestion - a rare bug in the signature calculation. Please focus on this, run some tests, try a lot of signatures, probably some combinations of bytes or calculation state leads to different signature. Different SSL library? |
Thanks for the new log. Technically it looks somewhat different from the original issue (on the surface), but the root cause might very well be the same.
Based on the error message, the remote ( The funding tx outpoint is 5312520f35070e7c353377ac84483ffac5fe9d08a006fb2d40d73e64e594f203:1. Note that we are updating the feerate:
abs fee for the commitment tx (msat): >>> 3755 * 724 // 1000 * 1000
2718000
>>> 3756 * 724 // 1000 * 1000
2719000 The >>> fee=16777215-16774497
>>> fee
2718 It looks like the remote is incorrectly still calculating with the old feerate. The sig that we sent is for using the new feerate, it validates if I manually modify the output value sat of the commit tx lnd expects: >>> from electrum.crypto import sha256d
>>> from electrum import ecc
>>> from electrum.transaction import tx_from_any, PartialTransaction
>>>
>>> pubkey1 = bytes.fromhex("02d39abcc2e4a4496bc664e2c1c15e9960bc76c8bff7a5bc68bd79c9eb2f6476c6")
>>> pubkey2 = bytes.fromhex("03b4149f5f33e7a893e8ed189ac359b655deb7052baee79e1df001a8d305e2d09c")
>>>
>>> tx = tx_from_any("020000000103f294e5643ed7402dfb06a0089dfec5fa3f4884ac7733357c0e07350f52125301000000007268e7800161f5ff0000000000160014aec8d8fd4e9bcb41de8f241c4eef3d2a9811e5ece6fa9d20")
>>> tx = PartialTransaction.from_tx(tx)
>>> txin = tx._inputs[0]
>>> txin.script_type = 'p2wsh'
>>> txin.pubkeys = sorted([pubkey1, pubkey2])
>>> txin.num_sig = 2
>>> txin._trusted_value_sats = 16777215 # funding_sat
>>>
>>> tx._outputs[0].value = tx._outputs[0].value - 1 # hack to change tx to use new feerate
>>>
>>> sig = bytes.fromhex("304402205ffed5b844d11c1b8f5e4be6dfb17287265eef80ffd975195c76ee6a9ff424250220173cd11940a49db06fe524d24fbf7eb9180f93b8117536c78eb7db8904823304")
>>> sig = ecc.sig_string_from_der_sig(sig)
>>>
>>> preimage_hex = tx.serialize_preimage(0)
>>> pre_hash = sha256d(bytes.fromhex(preimage_hex))
>>>
>>> ecc.verify_signature(pubkey1, sig, pre_hash)
False
>>> ecc.verify_signature(pubkey2, sig, pre_hash)
True The issue seems to be similar to ElementsProject/lightning#3341 I will try to open a bug report for lnd. |
I opened a ticket on bitfinex support waiting for reply I hope i will get some information about the node |
note that in the feb12 log, we send an update_fee (bumping the feerate 3755->3756 sat/kw),
lnd seemingly ignores the update_fee, and this leads to the sig mismatch >>> 3755 * 724 // 1000 * 1000
2718000
>>> 3756 * 724 // 1000 * 1000
2719000 in the feb5 log, the same thing might have happened, except it manifested later due to the rounding:
First, we send a commit sig (1), where there is no htlc yet, just the fee update. Everything validates as the rounding happens to silence the difference. >>> 4144 * 724 // 1000 * 1000
3000000
>>> 4145 * 724 // 1000 * 1000
3000000 Then, we add the htlc and send a commit sig (2). For this commit tx (theirs), the child htlc tx is the htlc-success case, where the rounding again happens to match: >>> 4144 * 703 // 1000 * 1000
2913000
>>> 4145 * 703 // 1000 * 1000
2913000 Then, we receive a commit sig (3) from them. For this commit tx (ours), the child htlc tx is the htlc-timeout case, and here the rounding happens to no longer silence the disagreement: >>> 4144 * 663 // 1000 * 1000
2747000
>>> 4145 * 663 // 1000 * 1000
2748000 hence the htlc sig we receive does not validate against the htlc tx we created (magic numbers for fee calc from bolt-03) based on this, I believe the |
So other node rounds differently? This should be easy to reproduce. I still don"t have response from bitfinex (just general promise to send response later) |
Well, not really - the issue is not the rounding itself. That is just my explanation that the feb5 log and the feb12 log are likely due to the same bug. AFAICT lnd is ignoring (in these cases) the update_fee message we sent, which clearly looks like a bug. However, I cannot reproduce on regtest locally. |
My suggestion for a workaround and improvement of the user experience After establishing the channel, wait for some defined time (1 minute), during which the channel will be in some intermediate state, during which it cannot be controlled and is disabled from receiving and sending payments ("establishment" state) - as a workaround for the LND problem In terms of UI
In this state:
When trying to reconnect, the application should check if both parties have the same commitment, then it is possible to restart the connection, but I wouldn't do it more often than once a minute. As I understand it, this was purely about one party thinking that the other party received a commitment, there is no confirmation that this actually happened. |
The root cause here was clearly a bug in lnd, which has since been fixed (lightningnetwork/lnd#7401). Re your suggestions for more general improvements, it's true there is a lot of room for improvement, however in this case the nature of the bug kind of invalidates the suggestions: the remote peer sent us an error for the channel, and bolt-01 clearly says that in that case we are supposed to force-close. If the remote only sent a warning instead of an error, we would have wiggle-room. |
This bug happened after a channel has been opened with "bfx-lnd1" (0958ee8fe66a0845149a40418bc707d9f5f87d2f49e2dfed2f42b9958d74ca60) and while the first invoice has been in payment.
Resolved by force-closed, however, please look into it and try to find better solution how Electrum should handle such situation. Cycle in infinite loop is not best solution (restart did not help)
(log file / partially anonymized / from FUNDED to crash
(4.3.3 / 4.3.4)
The text was updated successfully, but these errors were encountered: