-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peer storage #2888
base: master
Are you sure you want to change the base?
Peer storage #2888
Conversation
ca82dbc
to
1408152
Compare
1408152
to
26118a7
Compare
26118a7
to
ed48cb0
Compare
ed48cb0
to
abaacc2
Compare
abaacc2
to
3f0a426
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was an easy PR, but as I dived into it I think there are many subtle aspects that have a big impact on scenarios where users restore from seed. That's why I have multiple comments that aren't trivial, for which we need to decide how we'd like eclair to behave.
eclair-core/src/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/db/pg/PgPeersDb.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/wire/protocol/LightningMessageTypes.scala
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #2888 +/- ##
==========================================
- Coverage 86.16% 86.01% -0.15%
==========================================
Files 224 227 +3
Lines 20073 20198 +125
Branches 813 829 +16
==========================================
+ Hits 17295 17373 +78
- Misses 2778 2825 +47
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit afraid of the DoS risks associated with this feature: our node will be a very likely target (everyone will want to store some data with us), so we must be very careful that this cannot be exploited.
@@ -517,6 +527,14 @@ class Peer(val nodeParams: NodeParams, | |||
d.peerConnection forward unknownMsg | |||
stay() | |||
|
|||
case Event(store: PeerStorageStore, d: ConnectedData) if nodeParams.features.hasFeature(Features.ProvideStorage) && d.channels.nonEmpty => | |||
startSingleTimer("peer-storage-write", WritePeerStorage, nodeParams.peerStorageWriteDelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this works as you intended: since startSingleTimer
replaces a previous timer, if our peer sends PeerStorageStore
more frequently than nodeParams.peerStorageWriteDelay
, we will never persist any of their backup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in #2945 with some other nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was on purpose. If a peer updates the storage all the time there is no need to persist it, better wait for the updates to stop.
connect(remoteNodeId, peer, peerConnection2, switchboard, channels = Set(ChannelCodecsSpec.normal), sendInit = false, peerStorage = Some(hex"0123456789")) | ||
peerConnection2.send(peer, PeerStorageStore(hex"1111")) | ||
connect(remoteNodeId, peer, peerConnection3, switchboard, channels = Set(ChannelCodecsSpec.normal), sendInit = false, peerStorage = Some(hex"1111")) | ||
assert(nodeParams.db.peers.getStorage(remoteNodeId).contains(hex"abcdef")) // Because of the delayed writes, the original value hasn't been updated yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be flaky, depending on the machine timing, this could contain any of the peer storage version. We only need the eventually
condition IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in #2945
using(pg.prepareStatement("UPDATE local.peer_storage SET removed_peer_at = ? WHERE node_id = ?")) { statement => | ||
statement.setTimestamp(1, TimestampSecond.now().toSqlTimestamp) | ||
statement.setString(2, nodeId.value.toHex) | ||
statement.executeUpdate() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the storage removal this way creates the following bug:
- we close our channel with a peer, thus setting
removed_peer_at
- our peer then opens a new channel with us
- they keep using this active channel
- after 30 days, we will delete their data from our DB even though we now have an active channel
I think this should be done differently:
- remove the
removed_peer_at
column frompeer_storage
- create a dedicated table
local.peer_storage_cleanup (node_id TEXT NOT NULL PRIMARY KEY, peer_removed_at TIMESTAMP WITH TIME ZONE NOT NULL)
- in
removePeer
add a row inlocal.peer_storage_cleanup
- in the
PeerStorageCleaner
, once per day:- list all entries in
local.peer_storage_cleanup
- for each
node_id
in this list:- go through the
Swichboard
to obtain the peer actor (if it exists) and requestGetPeerChannels
- if we have a
NORMAL
channel, remove thisnode_id
from thepeer_storage_cleanup
table - otherwise, if we've exceeded
removalDelay
, remove thisnode_id
from thepeer_storage_cleanup
table and remote the associated peer storage fromlocal.peer_storage
- go through the
- list all entries in
This guarantees that we don't delete storage from peers with whom we re-open a channel. This is a bit cumbersome to do, if you can think of a better way to avoid this issue, feel free to implement that!
@@ -517,6 +527,14 @@ class Peer(val nodeParams: NodeParams, | |||
d.peerConnection forward unknownMsg | |||
stay() | |||
|
|||
case Event(store: PeerStorageStore, d: ConnectedData) if nodeParams.features.hasFeature(Features.ProvideStorage) && d.channels.nonEmpty => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking d.channels.nonEmpty
isn't enough, this exposes us to the following DoS attack:
- Mallory create a dummy
node_id
, sendsopen_channel
andpeer_storage_store
- We store her blob of data in our DB
- Mallory disconnects, aborting the channel open, so it didn't cost her anything
- She does the same thing with other
node_id
s in a loop
We end up storing a large amount of data in our peer_storage
DB that we will only clean-up after the removalDelay
. I think we should only store our peer's data when we have an active channel (in the NORMAL
state): this ensures that some on-chain fees have been paid which protects against DoS. This matches our current behavior, where we only store Phoenix data as part of an active channel's commitments object.
This is a bit cumbersome to do because checking that we have an active channel is asynchronous (we need to send a message to our d.channels
actors and wait for their response (we can use the PeerChannelsCollector
for that). I'm not sure yet how exactly we can rework the code to achieve this without introducing too much complexity, how do you think we should do this? It probably makes sense to brainstorm this before actually implementing it.
Implements lightning/bolts#1110