-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: implement new safe static channel backup and recovery scheme, RPCs, and cli commands #2313
multi: implement new safe static channel backup and recovery scheme, RPCs, and cli commands #2313
Conversation
chanbackup/backupfile.go
Outdated
} | ||
} | ||
|
||
// UpdateAndSwap will attempt write a new temporary backup file to disk with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// UpdateAndSwap will attempt write a new temporary backup file to disk with | |
// UpdateAndSwap will attempt to write a new temporary backup file to disk with |
"First, the easiest method for backup+recovery. After this PR, lnd will maintain a channels.backup file in the same location that we store all the other files. .." Can a dedicated folder be used? If it is mounted with sshfs or nfs, the channels.backup and channel.db files can be separated into different machine. |
I don't see why not. We can add a config flag for the backup file location. |
Alrighty, I've broken this PR up into 5 distinct PR's. Each new PR depends on the prior PR. As a result, they can go in one by one and be reviewed in smaller units, rather than waiting for the final dependents of this larger PR to be finalized. I'll keep this one as is though as it has the full description, and also builds allowing users to experiment with the set of commands. Once the final PR is ready for review (as all the prior PRs have been merged), I'll rebase this on on top of that, so everyone can use this as a central point of end to end testing. |
84e91ad
to
f31163d
Compare
Pushed up a rebased version as all the dependent PRs have been merged. Once in #1988 is in, then I'll start the final push to getting this merged! |
f31163d
to
747b1c2
Compare
747b1c2
to
861c6fb
Compare
Pushed up a new version that maintains the backup file on disk and modifies it based on new/closed channels. Will push up the integration tests next, and after that it's ready for review. |
In this commit, we modify the main `closeObserver` dispatch loop to only look for the local force close if we didn't recover the channel. We do this, as for a recovered channel, it isn't possible for us to force close from a recovered channel.
…recovered chan In this commit, we modify the `closeObserver` to fast path the DLP dispatch case if we detect that the channel has been restored. We do this as otherwise, we may inadvertently enter one of the other cases erroneously, causing us to now properly look up their dlp commitment point.
In this commit, we convert the server's Start/Stop methods to use the sync.Once. We do this in order to fix concurrency issues that would allow certain queries to be sent to the server before it has actually fully start up. Before this commit, we would set started to 1 at the very top of the method, allowing certain queries to pass before the rest of the daemon was had started up. In order to fix this issue, we've converted the server to using a sync.Once, and two new atomic variables for clients to query to see if the server has fully started up, or is in the process of stopping.
During the restore process, it may be possible that we have already heard about our prior edge from a node on the network (or our channel peers). As a result, we shouldn't exit if this happens, and instead should continue with the rest of the restoration process.
In this commit, we modify the `RestoreNodeWithSeed` and `RestartNode` methods to also accept an SCB. This will be useful in new integration tests to properly exercise the various restore/restart scenarios using static channel backups.
In this commit, we update all uses of the `getChanPointFundingTxid` to match the new function signature. We no longer need to convert to a chainhash.Hash, as the method does so underneath now.
…t to new func In this commit, we modify the core testDataLossProtection test to extract the primary DLP assertion logic into a new function. We do this, as the upcoming SCB tests will fallback to this test after some initial set up.
In this commit, we add 4 new itests for exercising the SCB restore process via 4 primary scenarios: recover from backup using RPC, recover from file using RPC, recover channels during init/creation, recover channels during unlock. With all fields populated there're a total of 24 new scenarios to cover. At the time of authoring of this commit, the other scenarios (bits are: initiator, updates, private) have been left out for now, as they increased the run time of the integration tests significantly.
fc8f85c
to
f216027
Compare
Tested on a testnet node that has been running with |
@molxyz at runtime, |
In that case, you wouldn't actually be able to decrypt the SCB unless you read out the private data of the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you can include what the commands are to restore (exact syntax), and what the expected outputs would be (just and example)? Considering how important this is, just guess and 'tying to figure it out' may not be the best idea. From my understanding, this isn't actually a "back up" of the channels, and is instead a "channel funds recovery mechanism". Correct? If you restored using this, you'd have a node with zero channels, and would have to start open channels from scratch. Correct? |
Check out the PR description, more docs will be provided later.
…On Mon, Apr 1, 2019, 5:23 PM ZapUser77 ***@***.***> wrote:
Any chance you can include what the commands are to restore (exact
syntax), and what the expected outputs would be (just and example)?
Considering how important this is, just guess and 'tying to figure it out'
may not be the best idea.
From my understanding, this isn't actually a "back up" of the channels,
and is instead a "channel funds recovery mechanism". Correct? If you
restored using this, you'd have a node with zero channels, and would have
to start open channels from scratch. Correct?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#2313 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA87Lk9FdliyMFr1ImFE7wRLrAHHFsSZks5vcqL7gaJpZM4ZMeKM>
.
|
"Check out the PR description" "more docs will be provided later." |
Would it be possible to create a "backup" manually, in the scenario where a node is lost and accessing the original channels.backup file isn't accessible anymore? Put in another way, given that you know the |
Overview
In this PR, we implement a new safe scheme for static channel backups (SCB's) for
lnd
. We say safe, as care has been taken to ensure that there are no foot guns in this method of backing up channels, vs doing things likersync
ing or copying thechannel.db
file periodically. Those methods can be dangerous as one never knows if they have the latest state of a channel or not. Instead, we aim to provide a simple safe instead to allow users to recover the settled funds in their channels in the case of partial or complete data loss. The backups themselves are encrypted using the a key derived from the user's seed, this way we protect privacy of the users channels in the back up state, and ensure that a random node can't attempt to import another user's channels.Once this PR is merged, given their seed and the latest back up file, the user will be able to recover both their on-chain funds, and also funds that are fully settled within their channels. By "fully settled" we mean funds that are in the base commitment outputs, and not HTLCs. We can only restore these funds as right after the channel is created, we have all the data required to make a backup. In contrast, in order to resolve HTLCs, we would also need to update the backup state with each new channel update, which is tricky to do without additional infrastructure. This infrastructure will be built out in the near future, but until then we have this scheme which will also be a fall back in the scenario that any higher level mechanisms fail.
At a later point, we also plan to propose this backup scheme as an addition to the spec, as even with the change to make the "to self" outputs static, we still need this SCB information in order to restore user funds. Additionally, the current serialization format is a bit up in the air. Atm, we use the same "codec" as we do within the wire protocol for the BOLT specs. However, we'll likely move to a TLV (type-length-value) format as it's extremely flexible and allows us to add/remove fields in the future once we gain new channel types, or modifications are made in the protocol that warrant a change to the backup format. Most importantly, if
aezeed
and thischanbackup
scheme are added to the spec, then it will be possible to write a simple program, that given a seed+backup from any of the implementations, will be able to recover all funds (sweep to an address) the shutdown.Recovery Flow
Skipping the backup flow for a second, given their 24-word
aezeed
seed, and a specialchannels.backup
file, the recovery flow would be something like the followingThe user uses
lncli create
or the gRPCWalletUnlocker.Init
call to input their seed and fully serialized backups.lnd
boots up and the wallet performs a rescan from the wallet's birthday (encoded in theiraezeed
) to restore all on-chain funds. Once this process is complete, the mainlnd
server will start up.Given the set of channels to recover, the server will then (using the new
chanbackup
) package, will insert a series of "channel shells" into the database. These contains only the information required to initiate the DLP (data loss protection) protocol and nothing more. As a result, they're makred as "recovered" channels in the database, and we'll disallow trying to use then for any other process.Once the channel is recovered, the
chanbackup
package will attempt to insert aLinkNode
that contains all prior addresses that we were able to reach the peer at. During the process, we'll also insert the edge for that channel (only out outgoing direction) into the database as well.lnd
will then start up, and as usual attempt to establish connections to all peers that we have channels open with.Once we connect with a peer, we'll then initiate the DLP protocol. The remote peer will discover that we've lost data, and then immediately force close their channel. Before they do though, they'll send over their latest unrevoked commitment point which we need to derive keys (will be fixed in BOLT 1.1 by making the key static) to sweep our funds.
Once the commitment transaction confirms, given information within the
SCB
we'll re-derive all keys we need, and then sweep the funds.Backup + Recovery Methods
This PR exposes multiple safe ways to backup and recover a channel. We expect only one of them to be used primarily by unsophisticated end users, but have provided other mechanisms for more advanced users and business that already script
lnd
via the gRPC system.First, the easiest method for backup+recovery. After this PR,
lnd
will maintain achannels.backup
file in the same location that we store all the other files. Users will at any time be able to safely copy and backup this file. Each time a channel is opened or closed,lnd
will update this file with the latest channel state. Users can use scripts to detect changes to the file, and upload them to their backup location. Something likefsnotify
can notify a script each time the file changes to be backed up once again. The file is encrypted using an AEAD scheme, so it can safely be stored plainly in cloud storage, your SD card, etc. The file uses a special format and can be used to import via any of the recovery methods described below.The second mechanism is via the new
SubscribeChanBackups
steaming gRPC method. Each time an channel is opened or closed, you'll get a new notification with all thechanbackup.Single
files (described below), and a singlechanbackup.Multi
that contains all the information for all channels.Finally, users are able to request a backup of a single channel, or all the channels via the cli and RPC methods. Here's an example, of a few ways users can obtain backups, see the PR for full details:
Static Channel Backup Scheme
Crypto
For encryption, we utilize
chacha20poly1305
with a random 24 byte nonce. We use a larger nonce size as this can be safely generated via a CSPRNG without fear of frequency collisions between nonces generated. To encrypt a blob, we then use this nonce as the AD (associated data) and prepend the nonce to the front of the ciphertext package.For key generation, in order to ensure the user only needs their passphrase and the backup file, we utilize the existing keychain to derive a private key. In order to ensure that at we don't force any hardware signer to be aware of our crypto operations, we instead opt to utilize a public key that will be hashed to derive our private key. The assumption here is that this key will only be exposed to this software, and never derived as a public facing address.
chanbackup.Single
The SCB contains all information required to initiate the data loss protection protocol once we restore the channel and connect to the remote channel peer.
The primary way outside callers will interact with this package are via the Pack and Unpack methods. Packing means writing a serialized+encrypted version of the SCB to an io.Writer. Unpacking does the opposite.
The encoding format itself uses the same encoding as we do on the wire within Lightning. Each encoded backup begins with a version so we can easily add or modify the serialization format in the future, if new channel types appear, or we need to add/remove fields. The backup contains:
chanPoint
of the channel.shortChanID
of the channel.keychain.KeyLocator
that allows us to re-derive the payment bas epoint we need to sweep our funds .keychain.KeyDescriptor
that we need in order to re-derive ourshachain
root to validate the information the remote party gives us during the DLP protocol. (see the next section for the complications that arose here)chanbackup.Multi
Multi is a series of static channel backups. This type of backup can contains ALL the channel
backup state in a single packed blob. This is suitable for storing on your file system, cloud storage, etc. Systems will be in place within lnd to ensure that one can easily obtain the latest version of the Multi for the node, and also that it will be kept up to date if channel state changes.
Implementation Complications and Open Questions
The main complication that arose during the implementation was that I realized late in development, that we also need to backup the details w.r.t how we derive out
shachain
root. We got a bit lucky here as we store the private key we use as the root, and not the public key itself. In order to derive theshachain
roots, we use a specialkeychain.KeyFamily
. However, we don't store thekeychain.KeyLocator
information which is a two-tuple that allow us to derive a key w/o knowing the public key or having any state in the wallet. Instead, within the backup, we're forced to store the entire public key and not just the key locator information. As a result, I needed to modifykeychain.SecretKeyRing.DerivePrivKey
to support a brute force scan to allow us to derive the key. In the future, we'll want to do a migration to also store the key locator information so we don't need to always do this brute force. In order to ensure we don't scan to infinity if we don't actually know the public key, I've added a cap on the max number of iterations.As a result of the case above, it's now the case that any future hardware signers need to be aware of the
shachain
protocol, in order to generate and validate any points we receive.The one other section that we maybe want to modify is the way we derive the key we use for encryption. We made an attempt to ensure that any future hardware signers don't actually need to understand our encryption protocol. So instead what we do is use a public point with the assumption that it will never be used for an address and be unveiled to the outside world. One alternative that I had (but scrapped, idk why TBH) is use a point, but then have the hardware signer provide us with an ECDH of that point and another. This would ensure that the key is derived from secret data, but allow us to not store any private data in the backup.
TODO's
write integration tests
write additional unit tests in
channeldb
real world recovery attempts
update docs on how to use the recovery tools
after rpc: Add SubscribeChannels RPC. #1988 is in, finish hooking up the
chanbackup.SubSwapper
so we can auto update the backup file on diskFixes #175