From 90079f02454a63c32078ab23727068fd12bd2730 Mon Sep 17 00:00:00 2001 From: ZmnSCPxj jxPCSnmZ Date: Thu, 14 Oct 2021 12:01:55 +0800 Subject: [PATCH] doc/BACKUP.md: Reorder `litestream` section, add warnings about using it. --- doc/BACKUP.md | 131 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 101 insertions(+), 30 deletions(-) diff --git a/doc/BACKUP.md b/doc/BACKUP.md index f339a2ac8993..2dfa5e8b31cf 100644 --- a/doc/BACKUP.md +++ b/doc/BACKUP.md @@ -242,31 +242,6 @@ three or four storage devices. BTRFS would probably work better if you were purchasing an entire set of new storage devices to set up a new node. -## SQLite Litestream Replication -`/!\` WHO SHOULD DO THIS: Casual users - -One of the simpler things on any system is to use Litestream to replicate the SQLite database. -It continuously streams SQLite changes to file or external storage - the cloud storage option -should not be used. -Backups/replication should not be on the same disk as the original SQLite DB. - -/etc/litestream.yml : - - dbs: - - path: /home/bitcoin/.lightning/bitcoin/lightningd.sqlite3 - replicas: - - path: /media/storage/lightning_backup - - and start the service using systemctl: - - $ sudo systemctl start litestream - -Restore: - - $ litestream restore -o /media/storage/lightning_backup /home/bitcoin/restore_lightningd.sqlite3 - - - ## PostgreSQL Cluster `/!\` WHO SHOULD DO THIS: Enterprise users, whales. @@ -459,8 +434,104 @@ strategy should still be a last resort; recovery of all funds is still not assured with this backup strategy. You might be tempted to use `sqlite3` `.dump` or `VACUUM INTO`. -Unfortunately, these commands exclusive-lock the database. -A race condition between your `.dump` or `VACUUM INTO` and -`lightningd` accessing the database can cause `lightningd` to -crash, so you might as well just cleanly shut down `lightningd` -and copy the file at rest. +Unfortunately, these commands exclusive-lock the database; see the +next section for more information. + +## Backup Methods That Can Crash `lightningd` + +`/!\` WHO SHOULD DO THIS: *VERY* casual users who already have at +least one of the other backup methods. + +> `/!\` CAUTION ADVISED `/!\` + +`sqlite3` uses file-level locking to implement SQL transactions. +Thus, backup processes can lock the `sqlite3` database in order +to have a "clean" state of the database, then copy the database +to create a backup. + +However, if the backup process locks the file, `lightningd` will +stall when it wants to write to the SQLITE3 database. +This stall has a timeout, defaulting to 5 seconds, and if your +backup process keeps the file locked for more than 5 seconds, +`lightningd` will abort with a `database is locked` error. + +These methods only actually work on version 0.10.2 or later --- +on older versions the timeout is 0 seconds, i.e. if C-lightning +0.10.1 or older sees the database locked by these backup methods, +it will crash immediately. + +On fast media and no system load, the default 5 seconds will +usually be enough to prevent the abort, but otherwise on e.g. +slow HDDs you may need to increase this default timeout. +Use the `--sqlite3-busy-timeout` option, specifying an integer +number of seconds of timeout. +This can reduce the probability of crashes. + +`lightningd` and the Lightning Network protocol are resilient to +unclean shutdowns of the node software, and provided you have a +mechanism to automatically restart `lightningd` (e.g. a SystemD +unit) then the crashes are benign but annoying. +However, software you run on top of `lightningd` might not handle +the unclean shutdown of `lightningd` gracefully, so only use +these backup methods if you are running mostly-vanilla `lightningd` +or have extensively tested your system at high load that your +`--sqlite3-busy-timeout` setting is sufficient to prevent crashes. + +### SQLite Litestream Replication + +One of the simpler things on any system is to use Litestream to replicate the SQLite database. +It continuously streams SQLite changes to file or external storage - the cloud storage option +should not be used. +Backups/replication should not be on the same disk as the original SQLite DB. + +You will need to set your database file to WAL mode. +To do so, stop `lightningd`, then: + + $ sqlite3 lightningd.sqlite3 + sqlite3> PRAGMA journal_mode=WAL; + sqlite3> .quit + +Then just restart `lightningd`. + +/etc/litestream.yml : + + dbs: + - path: /home/bitcoin/.lightning/bitcoin/lightningd.sqlite3 + replicas: + - path: /media/storage/lightning_backup + + and start the service using systemctl: + + $ sudo systemctl start litestream + +Restore: + + $ litestream restore -o /media/storage/lightning_backup /home/bitcoin/restore_lightningd.sqlite3 + +Note that `litestream` uses its own timer, so there is a +tiny (but non-negligible) probability that `lightningd` +updates the database, then irrevocably commits to that update +by sending revocation keys to the counterparty, and *then* +your media crashes before `litestream` can replicate the +most recent update, thus leading to an incomplete recovery +later. +Treat this as a superior version of "Database File Backups" +above (i.e. try to recover via other backup methods first), +and only on 0.10.2 or later, and make sure to check if 5 +seconds is sufficient timeout on your system to prevent +excessive crash-restart cycles. + +### `sqlite3` `.dump` Or `VACUUM INTO` + +You can simply run a `crontab` entry to regularly open the +`lightningd.sqlite3` in `sqlite3`, then feed it with a +`.dump "${BACKUP_LOCATION}"` or `VACUUM INTO ${BACKUP_LOCATION};` +command. + +However, note that this will take a fair amount of time +even on very fast storage media due to copying the entire +database. +You will almost definitely need to change the +`--sqlite3-busy-timeout` setting. +The above `litestream` replication is strictly superior to this, +as it only needs to replicate changes.