Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel reinit fails: "Could not restore HTLCs"; add_htlc(…) returning CHANNEL_ERR_TOO_MANY_HTLCS #5636

Closed
whitslack opened this issue Sep 27, 2022 · 1 comment · Fixed by #5640

Comments

@whitslack
Copy link
Collaborator

Issue and Steps to Reproduce

I have an open channel that cannot be reinitialized because it currently has more outstanding HTLCs than the configured limit.

2022-09-27T01:03:47.116Z **BROKEN** 02…a8-channeld-chan#248802: out HTLC 28629 failed error 7
2022-09-27T01:03:47.355Z **BROKEN** 02…a8-channeld-chan#248802: Could not restore HTLCs (version 0.11.2-gentoo-r105)
2022-09-27T01:03:47.355Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: common/status.c:221 (status_failed) 0x55…5a
2022-09-27T01:03:47.355Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: channeld/channeld.c:3917 (init_channel) 0x55…bc
2022-09-27T01:03:47.355Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: channeld/channeld.c:3992 (main) 0x55…bc
2022-09-27T01:03:47.355Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) 0x7f…89
2022-09-27T01:03:47.356Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: ../csu/libc-start.c:381 (__libc_start_main_impl) 0x7f…44
2022-09-27T01:03:47.356Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: (null):0 ((null)) 0x55…80
2022-09-27T01:03:47.356Z **BROKEN** 02…a8-channeld-chan#248802: backtrace: (null):0 ((null)) 0xff…ff
2022-09-27T01:03:47.356Z **BROKEN** 02…a8-channeld-chan#248802: STATUS_FAIL_INTERNAL_ERROR: Could not restore HTLCs

In channeld/channeld.c (init_channel), we called status_failed from here:

	if (!channel_force_htlcs(peer->channel,
			 cast_const2(const struct existing_htlc **, htlcs)))
		status_failed(STATUS_FAIL_INTERNAL_ERROR,
			      "Could not restore HTLCs");

Immediately preceding this, in channeld/full_channel.c (channel_force_htlcs), we had hit this status_broken:

		e = add_htlc(channel, htlcs[i]->state,
			     htlcs[i]->id, htlcs[i]->amount,
			     htlcs[i]->cltv_expiry,
			     &htlcs[i]->payment_hash,
			     htlcs[i]->onion_routing_packet,
			     htlcs[i]->blinding,
			     &htlc, false, NULL, false);
		if (e != CHANNEL_ERR_ADD_OK) {
			status_broken("%s HTLC %"PRIu64" failed error %u",
				     htlc_state_owner(htlcs[i]->state) == LOCAL
				     ? "out" : "in", htlcs[i]->id, e);
			return false;
		}

And the error code is 7, which connectd/full_channel_error.h defines as CHANNEL_ERR_TOO_MANY_HTLCS.

Indeed, if we examine the channel, we find that there are currently more outstanding outbound HTLCs (6) than the channel allows (5):

{
   "peers": [
      {
         "id": "02…a8",
         "connected": false,
         "channels": [
            {
               "state": "CHANNELD_NORMAL",
               "scratch_txid": "2c…22",
               "last_tx_fee": "3847000msat",
               "last_tx_fee_msat": "3847000msat",
               "feerate": {
                  "perkw": 2427,
                  "perkb": 9708
               },
               "short_channel_id": "…74x…87x1",
               "direction": 0,
               "channel_id": "bc…57",
               "funding_txid": "56…bc",
               "funding_outnum": 1,
               "close_to_addr": "bc1qcf0wsg9xuqyts42hzv3nky64yfgged3rgcdyc8",
               "close_to": "00…23",
               "private": false,
               "opener": "remote",
               "closer": null,
               "features": [
                  "option_static_remotekey"
               ],
               "funding_allocation_msat": {
                  "02…a8": 5000000000,
                  "02…2f": 0
               },
               "funding_msat": {
                  "02…a8": "5000000000msat",
                  "02…2f": "0msat"
               },
               "funding": {
                  "local_msat": "0msat",
                  "remote_msat": "5000000000msat",
                  "pushed_msat": "0msat"
               },
               "msatoshi_to_us": 4280415434,
               "to_us_msat": "4280415434msat",
               "msatoshi_to_us_min": 0,
               "min_to_us_msat": "0msat",
               "msatoshi_to_us_max": 4932575284,
               "max_to_us_msat": "4932575284msat",
               "msatoshi_total": 5000000000,
               "total_msat": "5000000000msat",
               "fee_base_msat": "0msat",
               "fee_proportional_millionths": 2,
               "dust_limit_satoshis": 546,
               "dust_limit_msat": "546000msat",
               "max_htlc_value_in_flight_msat": 18446744073709551615,
               "max_total_htlc_in_msat": "18446744073709551615msat",
               "their_channel_reserve_satoshis": 50000,
               "their_reserve_msat": "50000000msat",
               "our_channel_reserve_satoshis": 50000,
               "our_reserve_msat": "50000000msat",
               "spendable_msatoshi": 2757131566,
               "spendable_msat": "2757131566msat",
               "receivable_msatoshi": 608548815,
               "receivable_msat": "608548815msat",
               "htlc_minimum_msat": 0,
               "minimum_htlc_in_msat": "0msat",
               "minimum_htlc_out_msat": "1msat",
               "maximum_htlc_out_msat": "4950000000msat",
               "their_to_self_delay": 432,
               "our_to_self_delay": 600,
               "max_accepted_htlcs": 5,
               "state_changes": [
                  {
                     "timestamp": "2022-06-19T01:22:28.067Z",
                     "old_state": "CHANNELD_AWAITING_LOCKIN",
                     "new_state": "CHANNELD_NORMAL",
                     "cause": "remote",
                     "message": "Lockin complete"
                  }
               ],
               "status": [
                  "CHANNELD_NORMAL:Will attempt reconnect in 1 seconds"
               ],
               "in_payments_offered": 39374,
               "in_msatoshi_offered": 12266250804137,
               "in_offered_msat": "12266250804137msat",
               "in_payments_fulfilled": 229,
               "in_msatoshi_fulfilled": 51492636816,
               "in_fulfilled_msat": "51492636816msat",
               "out_payments_offered": 28635,
               "out_msatoshi_offered": 4098428816263,
               "out_offered_msat": "4098428816263msat",
               "out_payments_fulfilled": 481,
               "out_msatoshi_fulfilled": 47212221382,
               "out_fulfilled_msat": "47212221382msat",
               "htlcs": [
                  {
                     "direction": "in",
                     "id": 39373,
                     "msatoshi": 50007751,
                     "amount_msat": "50007751msat",
                     "expiry": 755990,
                     "payment_hash": "c0…66",
                     "state": "SENT_ADD_REVOCATION"
                  },
                  {
                     "direction": "out",
                     "id": 28634,
                     "msatoshi": 586018518,
                     "amount_msat": "586018518msat",
                     "expiry": 756273,
                     "payment_hash": "c4…8b",
                     "state": "SENT_ADD_COMMIT"
                  },
                  {
                     "direction": "out",
                     "id": 28631,
                     "msatoshi": 200053202,
                     "amount_msat": "200053202msat",
                     "expiry": 755910,
                     "payment_hash": "2a…1a",
                     "state": "SENT_ADD_ACK_REVOCATION"
                  },
                  {
                     "direction": "out",
                     "id": 28630,
                     "msatoshi": 103034018,
                     "amount_msat": "103034018msat",
                     "expiry": 755939,
                     "payment_hash": "6b…7f",
                     "state": "SENT_ADD_ACK_REVOCATION"
                  },
                  {
                     "direction": "out",
                     "id": 28633,
                     "msatoshi": 98480672,
                     "amount_msat": "98480672msat",
                     "expiry": 756094,
                     "payment_hash": "01…5e",
                     "state": "SENT_ADD_ACK_REVOCATION"
                  },
                  {
                     "direction": "out",
                     "id": 28632,
                     "msatoshi": 435697356,
                     "amount_msat": "435697356msat",
                     "expiry": 756129,
                     "payment_hash": "ed…94",
                     "state": "SENT_ADD_ACK_REVOCATION"
                  },
                  {
                     "direction": "out",
                     "id": 28629,
                     "msatoshi": 50000102,
                     "amount_msat": "50000102msat",
                     "expiry": 756097,
                     "payment_hash": "f3…b3",
                     "state": "SENT_REMOVE_ACK_COMMIT"
                  }
               ]
            }
         ]
      }
   ]
}

Is it possible that we somehow exceeded the outbound HTLC limit, then the node was restarted, and then the channel could no longer be restored because the number of outstanding HTLCs exceeds the limit?

getinfo output

{
   "id": "02…2f",
   "alias": "…",
   "color": "…",
   "num_peers": …,
   "num_pending_channels": 1,
   "num_active_channels": …,
   "num_inactive_channels": 13,
   "address": [
      {
         "type": "ipv4",
         "address": "…",
         "port": 9735
      },
      {
         "type": "ipv6",
         "address": "…",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv6",
         "address": "::",
         "port": 9735
      },
      {
         "type": "ipv4",
         "address": "0.0.0.0",
         "port": 9735
      }
   ],
   "version": "0.11.2-gentoo-r105",
   "blockheight": 755869,
   "network": "bitcoin",
   "msatoshi_fees_collected": …,
   "fees_collected_msat": "…msat",
   "lightning-dir": "/var/lib/lightning/bitcoin",
   "our_features": {
      "init": "080269a2",
      "node": "800000080269a2",
      "channel": "",
      "invoice": "02000000024100"
   }
}
whitslack added a commit to whitslack/lightning that referenced this issue Sep 28, 2022
If a channel goes offline while the count of outstanding outgoing HTLCs
exceeds the limit that we enforce against the peer, then the channel
could never be brought online again because `add_htlc` called by
`channel_force_htlcs` in `channeld/full_channel.c` would return
`CHANNEL_ERR_TOO_MANY_HTLCS`. The protocol specification actually does
allow us to exceed the limits that we are enforcing against the peer;
we are only prohibited from exceeding the limits that the peer is
enforcing against us. `add_htlc` takes an `enforce_aggregate_limits`
parameter that appears to have been intended for `channel_force_htlcs`
to exempt the local node from obeying the limits that it is enforcing
against the peer, but this parameter was only being respected for the
total HTLC value-in-flight check but not for the HTLC count check. This
commit respects the parameter for the HTLC count check as well and
resolves the problem of "Could not restore HTLCs".

Fixes: ElementsProject#5636
Changelog-Fixed: channeld: Channel reinitialization no longer fails when the number of outstanding outgoing HTLCs exceeds `max_accepted_htlcs`.
whitslack added a commit to whitslack/lightning that referenced this issue Sep 28, 2022
If a channel goes offline while the count of outstanding outgoing HTLCs
exceeds the limit that we enforce against the peer, then the channel
could never be brought online again because `add_htlc` called by
`channel_force_htlcs` in `channeld/full_channel.c` would return
`CHANNEL_ERR_TOO_MANY_HTLCS`. The protocol specification actually does
allow us to exceed the limits that we are enforcing against the peer;
we are only prohibited from exceeding the limits that the peer is
enforcing against us. `add_htlc` takes an `enforce_aggregate_limits`
parameter that appears to have been intended for `channel_force_htlcs`
to exempt the local node from obeying the limits that it is enforcing
against the peer, but this parameter was only being respected for the
total HTLC value-in-flight check but not for the HTLC count check. This
commit respects the parameter for the HTLC count check as well and
resolves the problem of "Could not restore HTLCs".

Fixes: ElementsProject#5636
Changelog-Fixed: channeld: Channel reinitialization no longer fails when the number of outstanding outgoing HTLCs exceeds `max_accepted_htlcs`.
@whitslack
Copy link
Collaborator Author

With the patch in the linked pull request, my node was able to reinitialize this channel and has actually resolved all of the outstanding HTLCs, and the channel, which has been open since 19 June 2022, is again healthy and with any luck will survive to forward many more payments. If I had not made this ninja fix in the nick of time, an HTLC would have hit its deadline and forced the channel closed.

cdecker pushed a commit that referenced this issue Sep 28, 2022
If a channel goes offline while the count of outstanding outgoing HTLCs
exceeds the limit that we enforce against the peer, then the channel
could never be brought online again because `add_htlc` called by
`channel_force_htlcs` in `channeld/full_channel.c` would return
`CHANNEL_ERR_TOO_MANY_HTLCS`. The protocol specification actually does
allow us to exceed the limits that we are enforcing against the peer;
we are only prohibited from exceeding the limits that the peer is
enforcing against us. `add_htlc` takes an `enforce_aggregate_limits`
parameter that appears to have been intended for `channel_force_htlcs`
to exempt the local node from obeying the limits that it is enforcing
against the peer, but this parameter was only being respected for the
total HTLC value-in-flight check but not for the HTLC count check. This
commit respects the parameter for the HTLC count check as well and
resolves the problem of "Could not restore HTLCs".

Fixes: #5636
Changelog-Fixed: channeld: Channel reinitialization no longer fails when the number of outstanding outgoing HTLCs exceeds `max_accepted_htlcs`.
whitslack added a commit to whitslack/lightning that referenced this issue Oct 8, 2022
If a channel goes offline while the count of outstanding outgoing HTLCs
exceeds the limit that we enforce against the peer, then the channel
could never be brought online again because `add_htlc` called by
`channel_force_htlcs` in `channeld/full_channel.c` would return
`CHANNEL_ERR_TOO_MANY_HTLCS`. The protocol specification actually does
allow us to exceed the limits that we are enforcing against the peer;
we are only prohibited from exceeding the limits that the peer is
enforcing against us. `add_htlc` takes an `enforce_aggregate_limits`
parameter that appears to have been intended for `channel_force_htlcs`
to exempt the local node from obeying the limits that it is enforcing
against the peer, but this parameter was only being respected for the
total HTLC value-in-flight check but not for the HTLC count check. This
commit respects the parameter for the HTLC count check as well and
resolves the problem of "Could not restore HTLCs".

Fixes: ElementsProject#5636
Changelog-Fixed: channeld: Channel reinitialization no longer fails when the number of outstanding outgoing HTLCs exceeds `max_accepted_htlcs`.
whitslack added a commit to whitslack/lightning that referenced this issue Oct 28, 2022
If a channel goes offline while the count of outstanding outgoing HTLCs
exceeds the limit that we enforce against the peer, then the channel
could never be brought online again because `add_htlc` called by
`channel_force_htlcs` in `channeld/full_channel.c` would return
`CHANNEL_ERR_TOO_MANY_HTLCS`. The protocol specification actually does
allow us to exceed the limits that we are enforcing against the peer;
we are only prohibited from exceeding the limits that the peer is
enforcing against us. `add_htlc` takes an `enforce_aggregate_limits`
parameter that appears to have been intended for `channel_force_htlcs`
to exempt the local node from obeying the limits that it is enforcing
against the peer, but this parameter was only being respected for the
total HTLC value-in-flight check but not for the HTLC count check. This
commit respects the parameter for the HTLC count check as well and
resolves the problem of "Could not restore HTLCs".

Fixes: ElementsProject#5636
Changelog-Fixed: channeld: Channel reinitialization no longer fails when the number of outstanding outgoing HTLCs exceeds `max_accepted_htlcs`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant