Backup tool leveraging BTRFS for incremental backups. Key features:
- Atomic Copy: No need to shut your servers down for backup
- Incremental Snapshots: Take incremental snapshots, freely configure how long they are kept.
- Snapshot transfer via SSH: Widespread and secure way of data transfer.
- Simple Command Line Tool: Easy to setup, easy to use.
- Look at your Snapshots any Time: Snapshots are directly visible in the file system.
Traditional backup systems have two inherent pain points which are solved by the use of BTRFS snapshots for backup:
-
No Atomic Snapshots: The files are copied one by one. To take a consistent snapshot, your application has to be stopped before the backup. This makes the backup process more difficult to set up and prohibits frequent backups. Since BTRFS snapshots are atomic, snapshots can be taken at any time. If the snapshot is restored, it looks to the application as if a power loss had occurred.
-
Cumbersome Incremental Snapshots: Managing a combination of full and incremental backups is complicated and the restore time might suffer. With BTRBCK, the snapshots on the backup server are kept within a BTRFS file system. Snapshots just share common data and store the differences at a block level. There its no distinction between full and incremental snapshots. The time penalty of restoring a full backup an applying tens of incremental backups goes away. Any snapshot can be deleted, without affecting any other snapshot.
BTRBCK works with so called snapshot streams. All data of an application (Database, File store, ...) has to be placed in the working directory of a single stream. Then snapshots of the working directory can be taken and transferred to another system.
Streams are organized in stream repositories. This allows to share some configuration (like the destination host to push snapshots to) and to perform bulk operations (take a snapshot for each stream in the repository). There are two types of stream repositories: application stream repositories, which contain a working directory for each stream, and backup stream repositories, which do not contain working directories. Besides this, the two repository types have identical functionality. Backup stream repositories are intended to be used on backup hosts, where no working directories are required, while application stream repositories are required to run applications.
To set up a backup, your server and your backup host need to use a BTRFS file system. Create an application stream repository on the server and a backup stream repository on the backup host. Now you can create streams on the server and sync them to the backup host.
BTRBCK is distributed as debian package or as executable jar. Releases can be found under https://github.com/ruediste1/btrbck/releases. Install the package using
wget https://github.com/ruediste1/btrbck/releases/download/2.0/btrbck-cli_2.0_all.deb && sudo dpkg -i btrbck-cli_2.0_all.deb
Now you can create your first stream repository. First, open a root shell
sudo su
Create an empty directory on a BTRFS file system and run
btrbck create -a
Followed by
btrbck create myStream
to create your first stream. This created a myStream
directory in the repository. Create a file and take a snapshot:
echo "Hello" > myStream/world.txt
btrbck snapshot myStream
Running find .
you can see the working directory and the snapshot which has been taken. Delete the file and take a snapshot again
rm myStream/world.txt
btrbck snapshot myStream
Now, you can restore the file using
btrbck restore myStream 0
To conclude this section, setup BTRBCK on a second system. Create a stream repository and setup ssh with public key authentication for the root user. Then run
btrbck push myStream root@<host> <remote repo path>
This will transfer all snapshots to the remote host. Please note that for this to work, the root of the file system has
to be mounted directly. Subvolume mounts do not work for send/receive due to a bug in btrfs-tools
.
To operate, BTRBCK needs to invoke the btrfs
and chown
commands as super user. This
also includes invocation on remote systems (push
and pull
to/from other hosts), where
sudo has to be set up in password less configuration. This can be accomplished by
- running
btrbck
as root - using
sudo btrbck
- instructing
btrbck
to use sudo commands
Variant 1 is the simplest and needs no further explanation. For variant 2, you have to use
the -sudoRemoteBtrbck
flag and add an /etc/sudoers.d/btrbck
file with the following contents:
%sudo ALL = (ALL) NOPASSWD: /usr/bin/btrbck
For variant 3, add a /etc/sudoers.d/btrbck
file with the following. In addition, the
-sudo
and -sudoRemoteBtrfs
flags have to be specified on the command line.
%sudo ALL = (ALL) NOPASSWD: /sbin/btrfs
%sudo ALL = (ALL) NOPASSWD: /bin/chown
Taking snapshots is all good an fine, but at some time you'll want to thin them out. If snapshots are taken every 10 minutes, a snapshot retention configuration might look as follows:
- keep all snapshots for one hour
- keep one snapshot per hour for one day
- keep 4 snapshots a day for a week
- keep one snapshot a week for a month
- keep one snapshot a month for a year
- keep one snapshot per year for 10 years.
- keep one snapshot per decade for 100 years.
This can be configured on a per stream level. The stream configuration file can be found in .backup/<stream name>/<stream name>.xml
for application stream repositories and in <stream name>/<stream name>.xml
in backup stream repositories. The following is
the configuration for the example above:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<stream initialRetentionPeriod="PT1H" snapshotInterval="PT10M">
<retention period="P1D" timeUnit="HOUR" snapshotsPerTimeUnit="1"/>
<retention period="P1W" timeUnit="DAY" snapshotsPerTimeUnit="4"/>
<retention period="P1M" timeUnit="WEEK" snapshotsPerTimeUnit="1"/>
<retention period="P1Y" timeUnit="MONTH" snapshotsPerTimeUnit="1"/>
<retention period="P10Y" timeUnit="YEAR" snapshotsPerTimeUnit="1"/>
<retention period="P100Y" timeUnit="DECADE" snapshotsPerTimeUnit="1"/>
</stream>
The snapshotIterval
will be discussed below. All Periods are specified according to ISO 8601.
Basically, the format is PnYnMnDTnHnMnS
, where zeros can be omitted and the T
is used to indicate the start of the time part.
For each retention, the beginning of the period
is determined and truncated to the last start of timeUnit
. Then
snapshotsPerTimeUnit
instants are evenly distributed over the time unit. For each instant, the next snapshot after the instant
is marked to be retained. This is repeated for all time units within the period. Afterwards, all snapshots not marked for
retention are deleted. These operations are performed in the UTC time zone.
To perform the pruning, run btrbck prune
to prune all streams, or btrbck prune <stream name>
to prune a single stream.
If neither initialRetentionPeriod nor any retentions are defined, no snapshots are pruned.
Snapshots can be transferred in an incremental manner between snapshot repositories, which are typically on different systems. For the transfer, SSH is used. The BTRBCK tool initiates an SSH connection to the remote host and starts the btrbck tool. The data is then transferred via stdin and stdout.
For this to work, there may be no password prompts. For SSH, this can be accomplished by using public key authentication. To run the BTRBK tool, this can be accomplished as outlined in the getting started section.
Snapshots are trasferred using the btrbck push
or btrbck pull
commands. By using the -c
switch, streams are created in the
target repository if they do not exist. At this point, the stream configuration is copied. After that, the stream configuration
is never touched again. Thus, modifications to the snapshot retentions are not synchronized automatically. This is intentional, to
protect you from loosing snapshots due to a single misconfiguration.
If the same stream is copied to multiple application stream repositories, working on both and transferring the snapshots to a single backup repository would result in a big mess. This is addressed by the stream version history. Each copy of a stream gets its unique ID. Whenever snapshots are taken or restored, this is recorded in the history. Before snapshots are transferred it is checked if target stream is an ancestor of the source stream. If this is not the case, the snapshot transfer is rejected.
For taking snapshots, the history only stores the stream ID and the number of times a snapshot has been taken. Thus, as long as you don't change the stream instance you are working with, the history remains very compact.
BTRBCK uses a per repository lock to control concurrent operations on a repository. This
lock can be held manually using btrbck lock
. When a repository is locked, all operations
wait for the lock before continuing.
BTRBCK has been prepared to operate in a fully automatic manner. This is enabled by adding a cron job running the btrbck process
command.
The following actions are taken for each stream:
-
Take a snapshot as defined in the
snapshotInterval
in the stream.xml
file. A snapshot is taken if the last snapshot is older than the snapshot interval. If you want snapshots for example every hour, make the cron job run process every hour but set the interval to something less, like 9 minutes 30 seconds. This makes sure that a snapshot is taken whenever the cron job is run. Otherwise, some snapshots might be skipped. -
Prune snapshots as defined in the stream
.xml
file. -
Sync the stream as defined in the repository configuration (see below)
The repository configuration is stored in the repository.xml
file or in
.backup/repository.xml
. It contains the synchronization configuration for the repository.
Multiple configurations are allowed.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<applicationStreamRepository>
<syncConfiguration
direction="PUSH"
sshTarget="user@host:port"
remoteRepoLocation="/backup"
createRemoteIfNecessary="true"
streamPatterns="*"/>
</applicationStreamRepository>
The direction can be PULL
or PUSH
.
The streamPatterns
attribute is a comma separated
list. Each element is the name of a stream which may contain * as
wildcard. If an element starts with a - any matching stream will be
excluded from the set of synced streams. For each local stream name, the
list is traversed from left to right. The first match decides if the
stream is in the set of synced streams or not. If no pattern matches, the
stream is not included in the set of synced streams.
Important: The stream names are evaluated locally. This implies that if you PULL, only the streams which
already exist in the pulling repository are taken into account. Initially, the streams have to be transferred
separately with the pull
command.
Copyright (C) 2014 Ruedi Steinmann
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
The unit tests expect /data/tmp
to reside on a btrfs file system. Due to a bug in btrfs-tools
, the root of the file system has
to be mounted directly. Subvolume mounts do not work for send/receive.
To update the version, use versions:set from the versions-maven plugin:
mvn versions:set -DnewVersion=2.50.1-SNAPSHOT
It will adjust all pom versions, parent versions and dependency versions in a multi-module project.
If you made a mistake, do
mvn versions:revert
afterwards, or
mvn versions:commit
if you're happy with the results.
The version will automatically be reflected in btrbck version
.
Manually adjust the version in the download command of the getting started section
and the cli/btrbck
shell script.