All hardware will eventually die, and without care data will die with it. As
my home file server has grown, I considered a variety of technologies to
ensure my data's safety before settling on FreeBSD's ZFS.
ZFS provides many strong data protection features (e.g., RAID-Z and RAID-Z2
for drive redundancy, strong checksumming, snapshots, journalling, etc.)
ZFS also provides send and receive semantics for
moving snapshots around, but these primitives aren't well-integrated.
I wrote Zebu as a simple,
small-scale backup system to leverage these primitives in ZFS.
Zebu is a minimalistic system, intended to be run from cron. The
single command, zebu, will process data in three phases:
snapshot, cleanup, and transmit. Zebu operates over ZFS filesystems (and
can optionally recursively descend through sub-filesystems). Configuration
is driven from a central configuration file (/etc/zebu/zebu.conf,
by default). zebu.conf lists global parameters, as well as a
configuration stanza for every target ZFS filesystem.
During the snapshot phase, zebu will create (optionally
recursively) snapshots in the configured filesystem(s) named
zebu-<timestamp>. Before I developed Zebu, I used a
system called Dirvish to backup remote machines using rsync over ssh. Zebu
can optionally use rsync to update data in the ZFS filesystem before creating
a snapshot, providing similar functionality. Much like Dirvish, Zebu supports
a list of files (regular expressions, really) to exclude from the rsync; both
global and filesystem-specific exclude lists are allowed, and indicated in
zebu.conf. Once the rsync completes, and appropriate logs are
written, zebu will snapshot the ZFS filesystem.
Obviously, the cleanup phase will remove old snapshots. Each
zebu-created snapshot will contain a timestamp in its name, so
Zebu can merely compare ZFS snapshot names with the configured expiration
time, and destroy old snapshots. zebu will never remove the
last (most recent) snapshot, just in case something goes awry in backup
processing.
The transmit phase pipes the output of zfs send into the
configured transmission command. zebu will recursively descend
over child filesystems (barring configuration to the contrary), and send
each individually, rather than use a recursive ZFS send. Recursive sends
are not supported in early versions of ZFS (in FreeBSD 7.x), and will
copy filesystem attributes as a side-effect. Since Zebu doesn't copy over
filesystem attributes, it's possible for the source filesystem to be available
via NFS and uncompressed, but the destination to not advertise NFS and use
gzip - generally a desirable trait. Unfortunately, this can lead to some
issues if the transmit phase is interrupted (see below).
Consider two servers - a primary and a backup. Zebu is designed to run from
cron on both of these, performing all three phases on primary and only cleanup
on the backup (though zebu can also be used on the backup server,
to snapshot and transmit its system-local files to yet another machine, or
back to the primary).
For example, here's a configuration similar to what I use on my primary file server:
[DEFAULT] basepath=/ excludes=/etc/zebu/excludes expiretime=30:0:0:0 rsync_path=/usr/local/bin/rsync transmit_cmd=/usr/bin/ssh -x -qT -l root backup "/sbin/zfs recv -F -d pool" lockfile=/tmp/zebu.lock [pool/backup/archive] recurse=yes doTransmit=yes [pool/backup/time_machine] recurse=yes doTransmit=no [pool/backup/linode] rsync_server=linode.example.com doTransmit=yes [pool/homes] recurse=yes doTransmit=yes
On the backup server, you can use a similar config file (to handle regular cleanups, and local filesystems):
[DEFAULT] basepath=/ excludes=/etc/zebu/excludes expiretime=30:0:0:0 rsync_path=/usr/local/bin/rsync transmit_cmd=/usr/bin/ssh -x -qT -l root primary "/sbin/zfs recv -F -d pool" lockfile=/tmp/zebu.lock [pool/backup/archive] recurse=yes doTransmit=no doSnapshot=no [pool/backup/linode] doTransmit=no doSnapshot=no [pool/homes] recurse=yes doTransmit=no doTransmit=no [pool/local] doTransmit=yes
These configs will result in several filesystems
(pool/backup/archive, pool/backup/homes, and
pool/backup/linode) getting snapshotted on primary,
then transmitted to backup. pool/backup/linode will see an
rsync from linode.example.com (a remote host being
backed up) before snapshots are taken. The corresponding config on the
backup server will ensure old snapshots are cleaned out there as well (lest
the primary server's transmit phase cause them to accumulate). Additionally,
the backup server will transmit pool/local over to the primary
server. Note that without a pool/local stanza in the primary
server's config, it's likely that snapshots from this filesystem will
accumulate indefinitely.
Since Zebu recursively descends filesystems itself during transmit, a transmit operation (unlike snapshot or clean) is not atomic. Unfortunately, Zebu cannot currently clean up well from a failure during transmit. If the sender or receiver processes (or machines) die during a recursive transmit, some child filesystems will have been transmitted while others have not. Zebu will merely re-try all transfers on the next run, and will probably encounter errors copying some of the child filesystems. Currently, there is no logic to handle an error in a child transfer; this will just appear as a failure of the transfer phase for the parent filesystem, and Zebu will be unable to transfer any data for that filesystem until the situation is manually rectified.
No attempt is made to replicate filesystem options (e.g.,
zfs get all). It's unlikely this will be added, since it's not
always clear what the expected behavior should be on the backup host.
All error reporting is handled via stdout and stderr.
I intended to run zebu via cron, and output from my cron jobs
goes somewhere. Your mileage may vary.
All code is copyright Mike Shuey, and licensed under GPL version 2.
| Source tarball | zebu-1.0.0.tar.gz |
|---|