Peer storage #2888

thomash-acinq · 2024-07-25T15:04:12Z

t-bast

I thought this was an easy PR, but as I dived into it I think there are many subtle aspects that have a big impact on scenarios where users restore from seed. That's why I have multiple comments that aren't trivial, for which we need to decide how we'd like eclair to behave.

eclair-core/src/main/scala/fr/acinq/eclair/Features.scala

eclair-core/src/main/scala/fr/acinq/eclair/db/PeersDb.scala

eclair-core/src/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala

eclair-core/src/main/scala/fr/acinq/eclair/wire/protocol/LightningMessageTypes.scala

eclair-core/src/main/scala/fr/acinq/eclair/NodeParams.scala

eclair-core/src/main/scala/fr/acinq/eclair/io/Peer.scala

codecov-commenter · 2024-11-22T13:18:20Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 67.85714% with 36 lines in your changes missing coverage. Please review.

Project coverage is 86.01%. Comparing base (f02c98b) to head (103fc65).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
...cala/fr/acinq/eclair/db/sqlite/SqlitePeersDb.scala	69.69%	10 Missing ⚠️
...c/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala	70.00%	9 Missing ⚠️
.../main/scala/fr/acinq/eclair/db/DualDatabases.scala	0.00%	6 Missing ⚠️
.../scala/fr/acinq/eclair/db/PeerStorageCleaner.scala	0.00%	5 Missing ⚠️
...-core/src/main/scala/fr/acinq/eclair/io/Peer.scala	79.16%	5 Missing ⚠️
...ir-core/src/main/scala/fr/acinq/eclair/Setup.scala	50.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2888      +/-   ##
==========================================
- Coverage   86.16%   86.01%   -0.15%     
==========================================
  Files         224      227       +3     
  Lines       20073    20198     +125     
  Branches      813      829      +16     
==========================================
+ Hits        17295    17373      +78     
- Misses       2778     2825      +47

Files with missing lines	Coverage Δ
...core/src/main/scala/fr/acinq/eclair/Features.scala	`100.00% <100.00%> (ø)`
...re/src/main/scala/fr/acinq/eclair/NodeParams.scala	`91.78% <100.00%> (+0.04%)`	⬆️
.../eclair/wire/protocol/LightningMessageCodecs.scala	`99.47% <100.00%> (+0.01%)`	⬆️
...q/eclair/wire/protocol/LightningMessageTypes.scala	`99.14% <ø> (ø)`
...fr/acinq/eclair/wire/protocol/PeerStorageTlv.scala	`100.00% <100.00%> (ø)`
...ir-core/src/main/scala/fr/acinq/eclair/Setup.scala	`75.00% <50.00%> (-0.30%)`	⬇️
.../scala/fr/acinq/eclair/db/PeerStorageCleaner.scala	`0.00% <0.00%> (ø)`
...-core/src/main/scala/fr/acinq/eclair/io/Peer.scala	`89.09% <79.16%> (-0.54%)`	⬇️
.../main/scala/fr/acinq/eclair/db/DualDatabases.scala	`9.60% <0.00%> (-0.34%)`	⬇️
...c/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala	`80.43% <70.00%> (-5.28%)`	⬇️
... and 1 more

... and 19 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

t-bast

I'm a bit afraid of the DoS risks associated with this feature: our node will be a very likely target (everyone will want to store some data with us), so we must be very careful that this cannot be exploited.

t-bast · 2024-11-22T15:16:58Z

eclair-core/src/main/scala/fr/acinq/eclair/io/Peer.scala

@@ -517,6 +527,14 @@ class Peer(val nodeParams: NodeParams,
        d.peerConnection forward unknownMsg
        stay()

+      case Event(store: PeerStorageStore, d: ConnectedData) if nodeParams.features.hasFeature(Features.ProvideStorage) && d.channels.nonEmpty =>
+        startSingleTimer("peer-storage-write", WritePeerStorage, nodeParams.peerStorageWriteDelay)


I'm not sure this works as you intended: since startSingleTimer replaces a previous timer, if our peer sends PeerStorageStore more frequently than nodeParams.peerStorageWriteDelay, we will never persist any of their backup.

This is fixed in #2945 with some other nits.

This was on purpose. If a peer updates the storage all the time there is no need to persist it, better wait for the updates to stop.

t-bast · 2024-11-22T15:34:02Z

eclair-core/src/test/scala/fr/acinq/eclair/io/PeerSpec.scala

+    connect(remoteNodeId, peer, peerConnection2, switchboard, channels = Set(ChannelCodecsSpec.normal), sendInit = false, peerStorage = Some(hex"0123456789"))
+    peerConnection2.send(peer, PeerStorageStore(hex"1111"))
+    connect(remoteNodeId, peer, peerConnection3, switchboard, channels = Set(ChannelCodecsSpec.normal), sendInit = false, peerStorage = Some(hex"1111"))
+    assert(nodeParams.db.peers.getStorage(remoteNodeId).contains(hex"abcdef")) // Because of the delayed writes, the original value hasn't been updated yet.


This can be flaky, depending on the machine timing, this could contain any of the peer storage version. We only need the eventually condition IMO.

This is fixed in #2945

t-bast · 2024-11-22T15:53:42Z

eclair-core/src/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala

+      using(pg.prepareStatement("UPDATE local.peer_storage SET removed_peer_at = ? WHERE node_id = ?")) { statement =>
+        statement.setTimestamp(1, TimestampSecond.now().toSqlTimestamp)
+        statement.setString(2, nodeId.value.toHex)
+        statement.executeUpdate()
+      }


Doing the storage removal this way creates the following bug:

we close our channel with a peer, thus setting removed_peer_at

our peer then opens a new channel with us

they keep using this active channel

after 30 days, we will delete their data from our DB even though we now have an active channel

I think this should be done differently:

remove the removed_peer_at column from peer_storage

create a dedicated table local.peer_storage_cleanup (node_id TEXT NOT NULL PRIMARY KEY, peer_removed_at TIMESTAMP WITH TIME ZONE NOT NULL)

in removePeer add a row in local.peer_storage_cleanup

in the PeerStorageCleaner, once per day:

list all entries in local.peer_storage_cleanup

for each node_id in this list:

go through the Swichboard to obtain the peer actor (if it exists) and request GetPeerChannels

if we have a NORMAL channel, remove this node_id from the peer_storage_cleanup table

otherwise, if we've exceeded removalDelay, remove this node_id from the peer_storage_cleanup table and remote the associated peer storage from local.peer_storage

This guarantees that we don't delete storage from peers with whom we re-open a channel. This is a bit cumbersome to do, if you can think of a better way to avoid this issue, feel free to implement that!

t-bast · 2024-11-22T16:03:42Z

eclair-core/src/main/scala/fr/acinq/eclair/io/Peer.scala

@@ -517,6 +527,14 @@ class Peer(val nodeParams: NodeParams,
        d.peerConnection forward unknownMsg
        stay()

+      case Event(store: PeerStorageStore, d: ConnectedData) if nodeParams.features.hasFeature(Features.ProvideStorage) && d.channels.nonEmpty =>


Checking d.channels.nonEmpty isn't enough, this exposes us to the following DoS attack:

Mallory create a dummy node_id, sends open_channel and peer_storage_store

We store her blob of data in our DB

Mallory disconnects, aborting the channel open, so it didn't cost her anything

She does the same thing with other node_ids in a loop

We end up storing a large amount of data in our peer_storage DB that we will only clean-up after the removalDelay. I think we should only store our peer's data when we have an active channel (in the NORMAL state): this ensures that some on-chain fees have been paid which protects against DoS. This matches our current behavior, where we only store Phoenix data as part of an active channel's commitments object.

This is a bit cumbersome to do because checking that we have an active channel is asynchronous (we need to send a message to our d.channels actors and wait for their response (we can use the PeerChannelsCollector for that). I'm not sure yet how exactly we can rework the code to achieve this without introducing too much complexity, how do you think we should do this? It probably makes sense to brainstorm this before actually implementing it.

t-bast added the Optech Make Me Famous! label Jul 29, 2024

thomash-acinq force-pushed the peer-storage branch from ca82dbc to 1408152 Compare August 29, 2024 11:50

thomash-acinq force-pushed the peer-storage branch from 1408152 to 26118a7 Compare October 2, 2024 09:42

thomash-acinq mentioned this pull request Oct 21, 2024

Add peer storage backup ACINQ/lightning-kmp#721

Draft

thomash-acinq force-pushed the peer-storage branch from 26118a7 to ed48cb0 Compare October 21, 2024 15:05

thomash-acinq force-pushed the peer-storage branch from ed48cb0 to abaacc2 Compare November 6, 2024 13:10

Peer storage

3f0a426

thomash-acinq force-pushed the peer-storage branch from abaacc2 to 3f0a426 Compare November 12, 2024 13:07

thomash-acinq marked this pull request as ready for review November 12, 2024 13:08

thomash-acinq requested a review from t-bast November 12, 2024 13:08

t-bast reviewed Nov 20, 2024

View reviewed changes

thomash-acinq added 2 commits November 21, 2024 16:04

comments

882402d

keep peer storage for a while after channels are closed

9ab1213

thomash-acinq requested a review from t-bast November 22, 2024 12:19

thomash-acinq added 2 commits November 22, 2024 13:34

days

a7cc1f7

flaky test

103fc65

t-bast reviewed Nov 22, 2024

View reviewed changes

t-bast mentioned this pull request Nov 22, 2024

nits #2945

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peer storage #2888

Peer storage #2888

thomash-acinq commented Jul 25, 2024 •

edited

Loading

t-bast left a comment

codecov-commenter commented Nov 22, 2024

t-bast left a comment

t-bast Nov 22, 2024

t-bast Nov 22, 2024

thomash-acinq Nov 22, 2024

t-bast Nov 22, 2024

t-bast Nov 22, 2024

t-bast Nov 22, 2024

t-bast Nov 22, 2024

Peer storage #2888

Are you sure you want to change the base?

Peer storage #2888

Conversation

thomash-acinq commented Jul 25, 2024 • edited Loading

t-bast left a comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 22, 2024

Codecov Report

t-bast left a comment

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

thomash-acinq Nov 22, 2024

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

t-bast Nov 22, 2024

Choose a reason for hiding this comment

thomash-acinq commented Jul 25, 2024 •

edited

Loading