Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]: backup and recovery #426

Open
guggero opened this issue Aug 2, 2023 · 4 comments
Open

[feature]: backup and recovery #426

guggero opened this issue Aug 2, 2023 · 4 comments
Assignees
Labels
mainnet preparedness Goal issues to be completed before mainnet release recovery
Milestone

Comments

@guggero
Copy link
Member

guggero commented Aug 2, 2023

Background

As a prerequisite for being able to officially support Bitcoin mainnet for taproot assets, we need to carefully think about how we approach the question of backup and recovery of tapd data, since not only assets might be at stake but also the BTC of the anchoring transaction output (you can't spend the BTC that carries assets without being able to reconstruct the full asset tree).

This issue serves as a collection/brainstorm issue around everything related to data safety, backups and recovery procedures.

Documentation

Similar to the lnd Operational Safety Guidelines document, we'll want a doc that describes the different data sources, what they are used for and how to best prevent loss thereof.
The document should (at least) describe the following key items:

  • What is the relationship between asset public/private keys (e.g. script_keys) and lnd's wallet/seed?
  • What data is required in order to recover both the assets and the BTC of a taproot asset output?
  • What data is stored where (tapd's database, lnd's wallet database, lnd's channel database)?
  • Where does tapd store its files and which files need to be backed up regularly?
  • How can the tapd database be set up in a production ready manner?
  • Recoverability when using a public universe vs. using a private one? (See further below).

How to prevent database loss

As long as the tapd database is fully intact and the seed for the lnd wallet is known, all funds are SAFU.
So to have a replicated (or at least regularly backed up) state of the DB should be the highest priority.
We should test and then document the following ways of setting up a database cluster or streaming replication:

  • Using a Postgres database cluster as a database backend: This is already possible and is the recommended way of running tapd in a production environment. We'll want to document some setup recommendations and best practices around this though.
  • Add support for low-level SQLite replication support, perhaps using something like https://github.com/benbjohnson/litestream.

How to recover from full database loss

Even though keeping the tapd database intact should always be the highest priority, the reality is that users often don't realize that uninstalling and re-installing an app on platforms like Umbrel causes all data to be deleted. So because we want to ship tapd as part of Lightning Terminal, which is available on such platforms, we need to have a strategy for basic recovery of assets and BTC for the case when the full tapd database is lost.

Possible approaches:

  • Keep a single file (similar to the SCB file used in lnd for static channel information) around that is updated on every mint, send and receive and keeps track of the latest on-chain output and proof chain, as well as the universe information. The file would basically contain all the information to be able to recover the asset and BTC funds, but not the transaction history. Then mobile and other platform apps would only need to make sure to create an off-device backup of that file whenever it changes.
  • When using public multiverses, then the information available in lnd could be enough to query those multiverses for the information required to recover access to asset and BTC funds. This requires the lnd wallet database to be fully intact though, as some of this information is added to the wallet DB by tapd and is not recoverable through a simple lnd wallet restore from seed.
    • Query the lnd wallet for unspent p2tr outputs that aren't BIP-0086, then look up the multiverse for assets related to those outpoints (this will only work if the asset anchoring transaction has a change output that goes back to the lnd wallet, because the actual asset anchoring output will not be recognized as "belonging" to the lnd wallet). This will work for asset mints and asset change outputs.
    • Query the lnd wallet for any specifically registered tapscript addresses, then look those outpoints up in the multiverse to recover asset proofs. This will work for assets received through taproot asset addresses (non-interactive receives). Though the tapscript addresses aren't directly derived from the seed, so if the lnd wallet was recovered from seed, this won't be possible.

New universe RPCs required for multiverse proof lookup

To allow some of the multiverse lookups described above, we might need additional indexes into the universe/multiverse tree structure:

  • Today we have assetID => outpoint || scriptKey
  • Might also need outpoint => assetID || scriptKey and scriptKey => assetID || outpoint
@guggero guggero added mainnet preparedness Goal issues to be completed before mainnet release recovery labels Aug 2, 2023
@Roasbeef Roasbeef added this to the v0.4 milestone Nov 6, 2023
@dstadulis
Copy link
Collaborator

add another index to the universe tree

https://github.com/benbjohnson/litestream is a critical component

@guggero
Copy link
Member Author

guggero commented Dec 14, 2023

Okay, I thought a lot about backups and also discussed things with @jharveyb and @dstadulis. Thanks for the inputs and ideas, I'll try to incorporate those here.

As I see it, we can attack this in two phases.

Phase 1 - File based backup

As described above, this would involve a single file on disk (and maybe a corresponding RPC that returns the same content on request) that contains all data required to represent the current un-spent asset outputs of the daemon.

Benefits/rationale

The main benefits of using a file based backup are:

  • It's reasonably straight-forward to implement
  • It's easy for the user to set up (e.g. in the default case the user doesn't have to do anything other than make sure the file generated is backed up to some safe place)
  • It's easy for the user to know what needs to be backed up and when the backup needs to be updated (e.g. using a file watcher to detect changes to the backup file)
  • The backup copy can just be replaced with the newest state, there is no need to keep a history of backed up files

Implementation

The following steps need to be taken to implement this phase:

  • Implement an internal function that can take an incomplete proof suffix (an asset proof from an unconfirmed send that is missing the on-chain block header part), check for the transaction on-chain and if the transaction is confirmed finishes the proof and then imports it normally as an asset into the local DB
  • Make sure that we can correctly import a proof for an asset that belongs to the local node (meaning we correctly detect that the script key in the proof is a key that the backing lnd node can derive, might need to do some active scanning)
  • Implement a function that can assemble the following information from the local DB:
    • All TAP addresses ever generated
    • The proofs for all currently un-spent asset outputs
    • The incomplete proof suffixes for current in-flight, unconfirmed transfer outputs (both the change and the receiver proof)
  • Come up with and implement a serialization format that can encode the information listed above into a single binary blob
  • Create a notification system that allows a subscriber to be notified about any event related to backup state:
    • Notify on new addresses created
    • Notify on new confirmed inbound transfer
    • Notify on new unconfirmed outbound transfer
    • Notify on finished (confirmed) outbound transfer
  • Create a backup subsystem that subscribes to the above notifications and updates a file on disk whenever a new event comes in
  • Allow the full file system path of the above mentioned file to be configured as a config/CLI flag (so it could be on a different file system, like a mounted network file system)
  • Create an RPC that takes a backup file and inserts all the information in it, resulting in the addresses/assets/transfers to be fully restored in an empty node
  • (optional) Create a new RPC that on demand returns the current content of the backup file as a binary blob
  • (optional) Create a new streaming RPC that emits an event whenever the backup notification service signals that the backup file was updated

Phase 2 - Personal backup universe based backup

There are multiple reasons why public universes shouldn't be used to rely on crucial financial data retention and availability. That's why we're introducing the concept of a personal backup universe here. A personal backup universe is just a tapd operated by the same user/entity, running on a different machine (and availability zone) that acts as a passive data receiver.

Benefits/rationale

The main benefits of using a backup universe compared to a backup file are:

  • Universes store proof chains more efficiently while the file based approach contains a lot of duplicate data (due to each file containing the full provenance proof chain)
  • A universe can be synced more efficiently while a single binary file cannot be backed up in an incremental manner
  • Multiple "active" (user-facing) tapd nodes can use the same backup universe server

Implementation

The following steps need to be taken to implement this phase:

  • Make sure change outputs from transfers are also pushed to the local universe
  • Make sure completed inbound transfers are also pushed to the local universe
  • Add a new config/CLI flag that allows us to add a backup universe (e.g. --backupuniverse=host:port)
  • A backup universe is a normal federation member but we would by default turn on full issuance and transfer proof sync
  • Use the notification service implemented in phase 1 and trigger a sync whenever a notification comes in; minimally we would do a full sync with all backup universes, maybe it makes sense to sync all universes
  • Allow incomplete proof suffixes (unconfirmed outbound transfers) to be stored to the local universe and synced to other universes, and make sure they can be updated/replaced once the full proof is available after on-chain confirmation
  • Add additional indexes (see above for more details) that allow us to look up proof leaves in a universe based on script keys
  • Create a recovery mechanism that executes the following steps:
    • Loop through keys in lnd for the TAP key family
    • For each key, form a BIP-86 script key and query all universe servers (ideally including a personal backup universe) for a proof leaf for that script key
    • If a proof is found, import it into the local proof archive and universe
    • Check if the asset in the proof spends any previously imported assets and if it does, set them to spent
    • If a proof is incomplete (e.g. missing the on-chain block header information), it might stem from a at-the-time unconfirmed transfer; look up the transaction on-chain and if it is confirmed update the block header part accordingly, then upsert the proof locally
    • Once a reasonably high number of key index (likely to be several thousand to be safe) has been reached and no new proof leaves for script keys have been found in a certain number of keys (similar to the "gap limit" concept as used by other wallets, but we'll want to choose a relatively high number here as well), stop the recovery process
  • (optional): Allow a tapd node to be configured in a way to act as a backup universe that doesn't need to be connected to an lnd node to simplify the setup. For example a --backupuniverseonly flag that would turn off the requirement of connecting to an lnd node and would therefore also skip block header verification of proofs (and automatically turn on --universe.public-access and --allow-public-uni-proof-courier)

Questions / brainstorm required

The main item that is not covered by the mechanism outlined above is the list of TAP addresses generated by a node. Because we need to know all addresses in order to detect future transfers (meaning transfers that happen after a node was recovered from backup), those addresses should be backed up as well when using a personal backup universe.
The following options present themselves at the moment:

  • Instruct users to also back up the backup file as described in phase 1 as that contains the TAP addresses (among everything else, so not ideal due to the potential size of the file)
  • Create a new universe and multiverse tree for addresses and sync them as well (turned off by default for public universes and only turned on for backup universes)
  • Create a secondary backup file that just contains addresses

@dstadulis dstadulis assigned Roasbeef and unassigned guggero Jan 9, 2024
@dstadulis
Copy link
Collaborator

#343 (comment) elucidated an additional element of necessary data to scope for backup and recovery: the full tapscript tree.

E.g. when a user add scripts to encumber their taproot assets, that data will need to be persisted / backed up to ensure asset access recovery / spendability

@dstadulis dstadulis modified the milestones: v0.4, v0.4.1 May 20, 2024
@Roasbeef Roasbeef modified the milestones: v0.4.2, v0.5 Aug 19, 2024
@Roasbeef
Copy link
Member

One other thing: the channel_reest message should include the current balance distribution (assetID+val) to allow sweeping after remote party force close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mainnet preparedness Goal issues to be completed before mainnet release recovery
Projects
Status: 💇‍♂️Needs Shaping
Development

No branches or pull requests

4 participants
@Roasbeef @guggero @dstadulis and others