Skip to content

Commit

Permalink
doc/BACKUP.md: Reorder litestream section, add warnings about using…
Browse files Browse the repository at this point in the history
… it.
  • Loading branch information
ZmnSCPxj committed Oct 15, 2021
1 parent 1300a6c commit 90079f0
Showing 1 changed file with 101 additions and 30 deletions.
131 changes: 101 additions & 30 deletions doc/BACKUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,31 +242,6 @@ three or four storage devices.
BTRFS would probably work better if you were purchasing an entire set
of new storage devices to set up a new node.

## SQLite Litestream Replication
`/!\` WHO SHOULD DO THIS: Casual users

One of the simpler things on any system is to use Litestream to replicate the SQLite database.
It continuously streams SQLite changes to file or external storage - the cloud storage option
should not be used.
Backups/replication should not be on the same disk as the original SQLite DB.

/etc/litestream.yml :

dbs:
- path: /home/bitcoin/.lightning/bitcoin/lightningd.sqlite3
replicas:
- path: /media/storage/lightning_backup
and start the service using systemctl:

$ sudo systemctl start litestream

Restore:

$ litestream restore -o /media/storage/lightning_backup /home/bitcoin/restore_lightningd.sqlite3



## PostgreSQL Cluster

`/!\` WHO SHOULD DO THIS: Enterprise users, whales.
Expand Down Expand Up @@ -459,8 +434,104 @@ strategy should still be a last resort; recovery of all funds is
still not assured with this backup strategy.

You might be tempted to use `sqlite3` `.dump` or `VACUUM INTO`.
Unfortunately, these commands exclusive-lock the database.
A race condition between your `.dump` or `VACUUM INTO` and
`lightningd` accessing the database can cause `lightningd` to
crash, so you might as well just cleanly shut down `lightningd`
and copy the file at rest.
Unfortunately, these commands exclusive-lock the database; see the
next section for more information.

## Backup Methods That Can Crash `lightningd`

`/!\` WHO SHOULD DO THIS: *VERY* casual users who already have at
least one of the other backup methods.

> `/!\` CAUTION ADVISED `/!\`
`sqlite3` uses file-level locking to implement SQL transactions.
Thus, backup processes can lock the `sqlite3` database in order
to have a "clean" state of the database, then copy the database
to create a backup.

However, if the backup process locks the file, `lightningd` will
stall when it wants to write to the SQLITE3 database.
This stall has a timeout, defaulting to 5 seconds, and if your
backup process keeps the file locked for more than 5 seconds,
`lightningd` will abort with a `database is locked` error.

These methods only actually work on version 0.10.2 or later ---
on older versions the timeout is 0 seconds, i.e. if C-lightning
0.10.1 or older sees the database locked by these backup methods,
it will crash immediately.

On fast media and no system load, the default 5 seconds will
usually be enough to prevent the abort, but otherwise on e.g.
slow HDDs you may need to increase this default timeout.
Use the `--sqlite3-busy-timeout` option, specifying an integer
number of seconds of timeout.
This can reduce the probability of crashes.

`lightningd` and the Lightning Network protocol are resilient to
unclean shutdowns of the node software, and provided you have a
mechanism to automatically restart `lightningd` (e.g. a SystemD
unit) then the crashes are benign but annoying.
However, software you run on top of `lightningd` might not handle
the unclean shutdown of `lightningd` gracefully, so only use
these backup methods if you are running mostly-vanilla `lightningd`
or have extensively tested your system at high load that your
`--sqlite3-busy-timeout` setting is sufficient to prevent crashes.

### SQLite Litestream Replication

One of the simpler things on any system is to use Litestream to replicate the SQLite database.
It continuously streams SQLite changes to file or external storage - the cloud storage option
should not be used.
Backups/replication should not be on the same disk as the original SQLite DB.

You will need to set your database file to WAL mode.
To do so, stop `lightningd`, then:

$ sqlite3 lightningd.sqlite3
sqlite3> PRAGMA journal_mode=WAL;
sqlite3> .quit

Then just restart `lightningd`.

/etc/litestream.yml :

dbs:
- path: /home/bitcoin/.lightning/bitcoin/lightningd.sqlite3
replicas:
- path: /media/storage/lightning_backup
and start the service using systemctl:

$ sudo systemctl start litestream

Restore:

$ litestream restore -o /media/storage/lightning_backup /home/bitcoin/restore_lightningd.sqlite3

Note that `litestream` uses its own timer, so there is a
tiny (but non-negligible) probability that `lightningd`
updates the database, then irrevocably commits to that update
by sending revocation keys to the counterparty, and *then*
your media crashes before `litestream` can replicate the
most recent update, thus leading to an incomplete recovery
later.
Treat this as a superior version of "Database File Backups"
above (i.e. try to recover via other backup methods first),
and only on 0.10.2 or later, and make sure to check if 5
seconds is sufficient timeout on your system to prevent
excessive crash-restart cycles.

### `sqlite3` `.dump` Or `VACUUM INTO`

You can simply run a `crontab` entry to regularly open the
`lightningd.sqlite3` in `sqlite3`, then feed it with a
`.dump "${BACKUP_LOCATION}"` or `VACUUM INTO ${BACKUP_LOCATION};`
command.

However, note that this will take a fair amount of time
even on very fast storage media due to copying the entire
database.
You will almost definitely need to change the
`--sqlite3-busy-timeout` setting.
The above `litestream` replication is strictly superior to this,
as it only needs to replicate changes.

0 comments on commit 90079f0

Please sign in to comment.