Backups with duply
Easy backups with duply
Duply is a frontend for the mighty Duplicity magic and a really nifty one. Anybody that has used Duplicity for backups may have noticed two things: how powerful and versatile a tool it is, and how tricky it can be to configure a backup scheme.
First of all, let’s just talk a little bit about the backend, that is, Duplicity. As the Wikipedia article nicely points out, Duplicity provides encrypted, versioned, remote backups that require very little of the remote server, in fact, it just needs for the server to be accessible via any of the supported protocols (FTP, SSH, Rsync, etc).
Configuring Duply
The first step required to use Duply is the creation of a backup profile. This
can be accomplished by running duply <profile> create
where <profile> is
whatever name we want for the profile.
This creates a configuration file called ~/.duply/<profile>/conf
that we will
edit. The configuration file is quite well documented but I will break down the
main points.
There are several settings we should take into account when configuring Duply:
- Encryption: whether we need our backups to be encrypted, either with symmetric encryption or with a key and passphrase;
- Location: where to save our backup to, either a remote server or a local folder;
- Source: the directory we want to back up (we can exclude files to avoid backing up garbage);
- Age: how long our old backups should be kept.
Encryption
There are two types of encryption Duply can use (unless we just disable it altogether), both with pros and cons.
Encryption with GPG keys is self-explanatory. You use a GPG key to encrypt each
volume of the backup and both the GPG key and the passphrase are needed to
decrypt the backup, giving you extra security. This also means that, if you lose
the GPG key file, you will not be able to recover your backup, therefore you
have to make sure that the ~/.gnupg/
directory is copied somewhere else and
not just inside the backup… Trust me, it’s happened to me:
GPG_KEY='ADD274FA' # Use 'gpg --list-keys' to see your keys
GPG_PW='VeryStrongPass' # Passphrase of the key
Symmetric encryption is simpler in that it only uses a single password for encryption, meaning you can recover your backup so long as you remember this password. Obviously, it is less secure than using a key, since it’s subject to bruteforce attacks:
#GPG_KEY='ADD274FA' # Comment out this line
GPG_PW='ItBetterBeStr0ng' # Password to use
Location
Now we have to configure where Duply will save our backups. In the conf
file
there are several examples for all the supported protocols. In my case I will
use FTP:
TARGET="ftp://ftpuser:ftppass@server/$USER@$HOSTNAME"
Notice that, if you use shell environment variables ($USER
, $HOSTNAME
, etc)
you have to use double quotes instead of the default single quotes, otherwise
the substitution won’t expand.
Source
Usually, as normal users, we would want to backup our home directory and exclude
those directories/files with an exclude list. This can be done with Duply by
changing the following setting in the conf
file:
SOURCE="$HOME"
Again, notice the double quotes for variable substitution.
For system backups, since we can only specify one source, we should use the root folder and use exclude lists:
SOURCE='/'
Excluding files
Once we have determined our source for backups, we should filter out files or
directories that would make our backups too big. We do this by creating the file
~/.duply/<profile>/exclude
and listing the files inside. Thankfully, these
lists accept default Unix globbing. For reference, this is what’s in my
exclude file:
**/*[Cc]ache*
**/*[Hh]istory*
**/*[Ss]ocket*
**/*[Tt]humb*
**/*[Tt]rash*
**/*[Bb]ackup
**/*.[Bb]ak
**/*[Dd]ump
**/*.[Ll]ock
**/*.log
**/*.part
**/*.[Tt]mp
**/*.[Tt]emp
**/*.swp
**/*~
**/.adobe
**/.cache
**/.dbus
**/.fonts
**/.gnupg/random_seed
**/.gvfs
**/.kvm
**/.local/share/icons
**/.macromedia
**/.obex
**/.rpmdb
**/.thumbnails
**/.VirtualBox
**/.wine
**/Downloads
As you can see, you can specify both wildcards or certain directories/files.
It’s worth noting that, even though the file is called exclude
, it can be used
to include files too. For instance, if we used the root directory as source
(SOURCE='/'
) as we talked about before, we can exclude all files except
certain directories like so:
+ /etc
+ /root
+ /var/lib/mysql
+ /var/mail
+ /var/spool/cron
+ /var/www
**
That last line would tell Duply to ignore all files except those listed previously and preceded by a plus sign.
Since version v0.5.14 of Duply, there is another way to exclude directories
from the backup. By creating a file called .duplicity-ignore
inside a
directory, we will force Duply to ignore it recursively. To enable this, we
will have to uncomment these lines in our configuration file
~/.duply/<profile>/conf
:
FILENAME='.duplicity-ignore'
DUPL_PARAMS="$DUPL_PARAMS --exclude-if-present '$FILENAME'"
Age
Finally, we can determine the age of the backups we keep when we run the purge commands. There are a couple of settings here depending on the way we make backups.
This setting tells Duply to keep backups up to a certain time (for example 6
weeks) when we run duply <profile> purge
:
MAX_AGE=6W
This other one tells Duply to keep a number of full backups when we run
duply <profile> purge-full
:
MAX_FULL_BACKUPS=2
However, the most useful one for me, is the setting that uses the
--full-if-older-than
option of duplicity to automatically make a full backup
when the previous full backup is older than a certain age:
MAX_FULLBKP_AGE=1W
DUPL_PARAMS="$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE "
Scheduling backups
Finally, after everything is configured, we should run a backup to test
everything is alright with the command duply <profile> backup
. This might take
a while since, not having any previous backup, it will execute a full backup.
After that, we can check the status of our backups by running duply <profile>
status
, which would give us something like this:
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Tue Apr 17 14:48:54 2012
Chain end time: Wed Apr 18 14:01:33 2012
Number of contained backup sets: 1
Total number of contained volumes: 52
Type of backup set: Time: Num volumes:
Full Tue Apr 18 14:48:54 2012 52
-------------------------
No orphaned or incomplete backup sets found.
--- Finished state OK at 15:46:34.122 - Runtime 00:00:03.495 ---
That looks cool and everything but we cannot rely on our memory to remember when we should make a backup. That’s why we should schedule our backups using cron (or anacron, or fcron) and leave the heavy lifting to them.
We can either specify a time for both a full and an incremental backup, like this:
@daily duply <profile> backup_verify
@weekly duply <profile> full_verify_purge --force
This will run and verify a daily incremental backup and a weekly full backup. Also, it will purge old backups weekly after completing and verifying the full backup.
However, if we configured Duply to use the --full-if-older-than
option of
duplicity like discussed above, we can just run a single command:
@daily duply <profile> backup_verify_purge --force
This is extremely useful for laptops and boxes that are not on 24x7.
Pre and post scripts
Another basic requirement for any backup solution is the option to run certain
commands both before and after the backup is executed. Duply, of course, has
this too and will run any command inside the file ~/.duply/<profile>/pre
before the backup and any command inside ~/.duply/<profile>/post
after the
backup.
This is useful to lock and flush databases before backup and unlocking them afterwards, maybe even make a LVM snapshot for consistent and quick backup. Or just to gather any other information that needs to be backed up too (f.e. installed packages, Delicious bookmarks, etc).
Live backups
There are some drawbacks to using the system while the backup is being run. An obvious one is the impact on performance, since the backup is using the disks.
Also we have the fact that, if the backups take a while, which is very likely to happen, and the files are modified in the meantime, the verification will fail. That doesn’t mean the backup has failed but the verification obviously will.
For this, I would recommend either a LVM snapshot as suggested above which, let’s face it, is not very likely to be done on anything other than a server; or we can just disable the verification and use ionice like so:
@daily ionice -c3 duply <profile> backup_purge --force
This will execute the backup with low I/O priority, which means we will be able to use the computer without much impact, and cron will still send us an email with the output of the command so we can confirm that the backup was done properly.