Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer backup storage (feature 40/41/42/43) #881

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

t-bast
Copy link
Collaborator

@t-bast t-bast commented Jun 28, 2021

Nodes can offer to altruistically store small amounts of data on behalf of their channel peers. It complements option_data_loss_protect and can let nodes that lost data fully recover their channels without any on-chain transaction.

The contents of the peer_backup is left unspecified to offer wide flexibility for implementations. It can for example contain an encrypted version of your internal channel state, or a mix of important data from several channels, it's really up to the implementations to go #reckless.

Please note that a drawback is that this uses a lot of space in the tlv stream of existing messages, which means there won't be a lot of additional space available if we want to enrich these messages with new tlv fields. We should choose the maximum size of backups carefully if we don't want to paint ourselves into a corner.

@SomberNight
Copy link
Contributor

SomberNight commented Jun 29, 2021

Thank you for writing this up!
In general I like this proposal very much, and we would definitely want to implement this (or whatever it evolves into) into Electrum.


On first read I had assumed that the backup blobs are per-peer but then realised they are actually per-channel.

One thing I would like to note is the usecase where, after restoring from seed, wallet software might be able to find some of their channel peers but not the channels themselves. i.e. Alice's wallet might know they have some channels with Bob but not know any of the channel ids (funding outpoints). In such a case Alice can establish a transport with Bob, and then wait for Bob to send one or more channel_reestablish messages. As there is no way for Alice to "request" the backup blobs (and she does not even know what the channel ids are), she must rely on Bob sending the channel_reestablish messages (regardless of who initated the transport).


One thing I am not clear is the (a)symmetry of this feature. I think this needs to be cleared up.
The PR text talks about "storage provider"s, and suggests an asymmetric relationship where e.g. a forwarding node offers storing backups for a light client. However, negotiating a feature such as peer_backup_storage is inherently symmetric with current BOLT-09. So how exactly does this look like?

Let's say there is a light client who wants to store backups with their peer (Carol); and let's say the light client does not want to store backups for the other party (or does it need to? it could I guess). Would the light client set peer_backup_storage in e.g. the init message? Would Carol set it? How would Carol know that they are the "service provider" in this relationship?
Or is it that they both must store blobs for the other one, and then the relationship is symmetric?

I am also wondering if the feature can even be used in a symmetric way.
If two counterparties of a channel, Alice and Bob, want to store backups with each other, and Alice loses state, then she will not be able to send the last backup sent by Bob. So assuming the backup blobs are used to in a Phoenix-like usecase (resume channel operation after restoring from seed); if the feature was used in a symmetric way (Alice storing blobs at Bob, and Bob storing blobs at Alice), then if either party loses state, the channel would get closed by the other party, I guess(?).

@t-bast
Copy link
Collaborator Author

t-bast commented Jun 30, 2021

Thanks @SomberNight this is very good feedback. I've started with the minimal approach we use in Phoenix, which is simpler than the general case because Phoenix is always fundee and initiates reconnections, but this PR is a good opportunity to offer a backup mechanism that supports more scenarios.

I think it would make sense to have two distinct types of backups, which would address the scenario described in your first paragraph:

  • channel backups (one per channel, as described in d873416), which may be updated at every commitment change
  • a node backup which can contain data about channels and peers

We could store in the node backup the list of our peers and the channel IDs we have with each peer (encrypted of course). That node backup could be sent to all of our peers: this way we only need to remember and reconnect to one of them to discover the others and the list of our channels.

We would need a new message to ask our peers to store an updated version of our node_backup_data. We would send that message for example whenever we've opened a new channel, to add the new channel ID to our backup. Our storage provider peers would send us back the latest version of that node_backup_data in init. At a high-level, does that sound like it would address your first comments?

Regarding the asymmetry of this feature, I think it is both desired and representative of real-world scenarios.
Let's detail for example the scenario where a mobile wallet Alice connects to a storage provider Bob.
Alice would not turn on the peer_backup_storage feature, because Alice doesn't offer to store people's data.
Bob would turn on the peer_backup_storage feature, which tells Alice that she can send her backups.

You're completely right to highlight that the case where both peers store each other's backups doesn't work: if one of them loses data, that will force the other to close channels. I don't think this feature is meant to be used between routing nodes running on servers. This type of node should implement DB redundancy internally to ensure they never lose data (instead of relying on their peers for that). I think this feature should only be used for mobile wallets (or more generally light clients) connecting to "service providers". But your comment made me realize that if two service providers connect to each other they will both have the peer_backup_storage feature activated but should not send backups to each other. I'm wondering if this should simply be covered by implementations (only mobile wallet implementations would actually send their backup data, server implementations would never send them nor expect peers to store their data) or if we should use distinct feature bits for "I can store your data" and "I'd like you to store my data" (but with the somewhat complex restriction that you cannot have two peers turn on both at the same time otherwise they'd run in the issue you mention if one of them loses data). What do you think?

@SomberNight
Copy link
Contributor

We would need a new message to ask our peers to store an updated version of our node_backup_data. We would send that message for example whenever we've opened a new channel, to add the new channel ID to our backup. Our storage provider peers would send us back the latest version of that node_backup_data in init. At a high-level, does that sound like it would address your first comments?

Yes, that sounds good. I think it might not be strictly needed, as a workaround could be the light client always waiting for the remote sending channel_reestablish first; and this kind of data (list of your peers and list of channel ids) could be stored as part of the per-channel blob. However, that is a bit hackish and I think it would be worth it to introduce this new message for per-peer backups; IMHO it would even simplify implementations overall. Also, having both per-channel and per-peer backups could enable more use cases in the future. For example, not tying some backups to a specific channel could allow a market where nodes could pay peers they don't have a channel with to store data for them.

if two service providers connect to each other they will both have the peer_backup_storage feature activated but should not send backups to each other. I'm wondering if this should simply be covered by implementations (only mobile wallet implementations would actually send their backup data, server implementations would never send them nor expect peers to store their data) or if we should use distinct feature bits for "I can store your data" and "I'd like you to store my data" (but with the somewhat complex restriction that you cannot have two peers turn on both at the same time otherwise they'd run in the issue you mention if one of them loses data). What do you think?

I think one feature bit is sufficient. It could behave as follows, given a transport between Alice and Bob:

  • if Alice has the feature set, Bob does not: Bob can send backups to be stored at Alice, Alice must not
  • if Alice and Bob both have the feature set, neither of them is allowed to send backups to be stored by the other
    So the feature bit means "I am willing to store your backups", but if both parties set it then neither can utilise it.
    So light clients would never set the feature, "service providers" could always set it (if they implement it); and if two service providers peer, they would recognise it.

One thing to note here is that light clients might want to "require" this feature but there isn't really a way to communicate that fact. Maybe that is fine as they could simply not peer with nodes that do not set it.


Another point is about deleting backups. If Alice ("light client") and Bob ("service provider") have a single channel, and Alice stores backups with Bob, and Bob e.g. force-closes the channel, would Bob now delete Alice's backup? I think Bob could keep the backups for some time or maybe even indefinitely (on a best effort basis) - as even then the service seems hard to DOS: opening channels is probably costly enough even if they end up being closed. I suggest the spec include a recommendation about what to do if there no longer are any open channels with a peer but the peer has stored backups with us.


One higher level point re use cases: I am wondering what the assumptions are when using this for "resumable channels". e.g. restoring from seed and discovering your existing channels, keeping them open and keep using them. Phoenix does this of course; but in that case the clients probably already have some trust in the ACINQ node... What do you think would be the implications of using it with arbitrary nodes? I think it might only work well if there is at least some trust. Unfortunately I don't think the game theory works out well: if the channel funder is the light client (user), there is virtually no risk for the remote for sending an old blob to the client on reestablish. They only risk missing out on future forwarding fees... their potential gain compared to that is huge :/

@t-bast
Copy link
Collaborator Author

t-bast commented Jun 30, 2021

I think one feature bit is sufficient.

Yes, I think so as well, adding a second one wouldn't be a very clean "fix".

One thing to note here is that light clients might want to "require" this feature but there isn't really a way to communicate that fact. Maybe that is fine as they could simply not peer with nodes that do not set it.

Exactly, I think that instead of requiring it, light clients would just scan node_announcements and connect only to nodes that advertised the feature (if they need peers that can store their backups).

I suggest the spec include a recommendation about what to do if there no longer are any open channels with a peer but the peer has stored backups with us.

This is a good point, the spec should mention that storage providers should (best effort) keep the latest backups stored even after closing the channel. That's what eclair currently does, as it doesn't cost much space (and as you mention, it's not subject to DoS since the size of backups is limited by the message size and there is an economic cost in the channel creation).

I'll add a section mentioning that.

What do you think would be the implications of using it with arbitrary nodes? I think it might only work well if there is at least some trust. Unfortunately I don't think the game theory works out well: if the channel funder is the light client (user), there is virtually no risk for the remote for sending an old blob to the client on reestablish. They only risk missing out on future forwarding fees...

I agree with you, this backup mechanism isn't completely bullet-proof. You can check that your peer behaves correctly at each channel_reestablish and init, but if they misbehave they won't be directly economically penalized (there's no penalty fee that you can automatically claim). As you mention, the only drawback for the cheating storage provider is that you won't use their services, and will probably tell your friends to avoid them as well. Is that a good enough incentive? I don't know, but I'm not sure we can do better...

I'll work on adding a commit to include the node_backup_data and the new associated message.

@t-bast
Copy link
Collaborator Author

t-bast commented Jul 7, 2021

@SomberNight I add the node_backup and several other requirements and explanations in 18bd2af, let me know if that's better.

01-messaging.md Outdated Show resolved Hide resolved
@t-bast t-bast mentioned this pull request Jul 9, 2021
@t-bast t-bast changed the title Peer backup storage (feature 36/37) Peer backup storage (feature 36/37/38/39) Jul 12, 2021
@rustyrussell
Copy link
Collaborator

rustyrussell commented Jul 19, 2021

I like this! But I think we should downgrade it to strictly "best effort", at least for peer_backup.

  1. If they give you an old state, don't get upset. This can happen anyway, since there's no ack.
  2. Require only best effort, not require them to perform a db commit on every new update.
  3. Importantly, this lets peers provide it for each other, without fear.
  4. Perhaps we should have an "append" flag to reduce data traffic? (The total size must be less than 32000 bytes still?).

The way I would use this feature is to store the commitment number in every peer_backup, and update at least three peers on every send of revoke_and_ack. That is the critical piece of information so we don't accidentally go backwards on restore.

Then have a rule on restart that insists on a threshold of init msgs before sending reestablish messages.

I also think it would be informative (perhaps in a new doc) to indicate the format implementations used: a TLV seems to fit pretty well here, IMHO, but details of encryption matter too.

@rustyrussell
Copy link
Collaborator

Features clash with #759 BTW!

@t-bast t-bast changed the title Peer backup storage (feature 36/37/38/39) Peer backup storage (feature 40/41/42/43) Aug 10, 2021
@t-bast
Copy link
Collaborator Author

t-bast commented Aug 10, 2021

I like this! But I think we should downgrade it to strictly "best effort", at least for peer_backup.
If they give you an old state, don't get upset. This can happen anyway, since there's no ack.

You're right, I made it best effort in 3129979

Perhaps we should have an "append" flag to reduce data traffic?

It's worth discussing, if we think it makes sense we should include it now in the backup tlv (to avoid adding a new tlv later for it).
I'm afraid it's useless with encrypted backups, so it only makes sense for unencrypted data (array of (channel_id, commitment_number)), but I personally would only send encrypted data to avoid leaking anything (just in case).

I also think it would be informative (perhaps in a new doc) to indicate the format implementations used: a TLV seems to fit pretty well here, IMHO, but details of encryption matter too.

I agree, bLIPs are likely a good place for that.

Features clash with #759 BTW!

Fixed in 42e68e0


I realized that my calculation for the maximum size for backups is completely wrong...
I considered a commitment_signed message with 483 htlc signatures, but it can actually contain two times that since there can be 483 incoming htlcs and 483 outgoing htlcs 🤦 (even though nodes may configure a lower max_accepted_htlcs to ensure there is some space available for backup data).

When that happens, there isn't much space available in the commitment_signed message for backup data (especially if we want to leave some space for future additional tlv fields). Maybe that's ok as you will soon follow up with a revoke_and_ack where you'll have enough space to send your backup data...

Should we still define an arbitrary maximum size for backup data (e.g. 32768 bytes, powers of two are nice) and add a requirement that the sender must not include the backup if that would cause the message to be bigger than the 65535 bytes message size limit? Or should we completely drop the maximum size for backup data and let it be bounded by the message size limit (65535 bytes)?

@akumaigorodski
Copy link

akumaigorodski commented Aug 23, 2021

@t-bast

We could store in the node backup the list of our peers and the channel IDs we have with each peer (encrypted of course). That node backup could be sent to all of our peers: this way we only need to remember and reconnect to one of them to discover the others and the list of our channels.

It seems to me this is actually the most important part at least for as long as we need to store revoked commit history for each channel. That history can't be attached to channel messages TLV and restoring and then keep using channels without it is obviously dangerous, they better be closed ASAP.

OTOH an encrypted list of peers is small and wallet user won't need to remember all peers they had channels with on wallet recovery, which is very useful.

@rustyrussell
Copy link
Collaborator

Don't add TLVs everywhere, simply use a separate message? That increases the limit (to its natural amount), as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants