BURP - BackUp and Restore Program

ABOUT
WHY
FEATURES
CHANGELOG
NEWS
FAQ
DOCS
BURP-UI
BURP2
DOWNLOAD
LICENCE
CONTRIBUTORS
DONATIONS
SPONSORS
CONTACT

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

< Prev Contents Next >

Appendix F - Open source competition

Before any design, coding, or testing was done, I researched the open source network backup solutions available. The following are the results of that initial research.

burp-1.3.36 (Keeling, 2011):
Has a simple client/server architecture.
Uses librsync in order to save network traffic and to save on the amount of space that is used by each backup. It also uses VSS (Volume Shadow Copy Service) to make snapshots when backing up Windows computers. Operates at a file-level granularity.
Version 1.3.36 was the latest version at the time of writing.

Advantages:

  • Backup clients can be centrally managed on a Unix-based server.
  • Good cross platform support for clients. Supports Unix-based systems, including Macs. Supports the native Windows Backup API with VSS snapshots. This ensures a consistent file system image and also ensures that all files are available to be read. Backup solutions that do not use the API will not be able to open files that other applications already have open.
  • Backs up file differences via delta differencing with librsync.
  • Supports files, directories, symlinks, hardlinks, fifos, nodes, permissions and timestamps.
  • Supports Linux and FreeBSD acls and xattrs.
  • Supports Windows permissions, file attributes, and so on, via VSS.
  • Supports Windows EFS files.
  • Storage and network compression using zlib.
  • Ability to continue interrupted backups.
  • Network communications encrypted with SSL.
  • Automatic SSL certificate authority and client certificate signing.
  • Client side file encryption - (note: this turns off delta differencing).
  • Scheduling.
  • Email backup success/failure notifications.
  • Pre/post backup/restore client scripts.
  • Storage data deduplication.
  • Automatic client upgrade.
  • Due to the design of the server, most configuration changes do not need a server restart in order to become effective.
  • Simple retention periods (e.g, keep 1 backup per day for 7 days, 1 backup per week for 4 weeks, 1 backup per 4 weeks for a year).
  • MD5 Verification of saved data.

    Disadvantages:

  • Since each file is stored as a separate file system entry on the server, Backups containing many files can end up using a lot of file system inodes. This has consequences on the amount of time it takes to complete subsequent backups, because many system calls are required, for example, to hard link unchanged files into place in the latest backup. Or to delete old backups, because each inode has to be unlinked.
  • As previously described, there are limitations of the librsync mechanism related to large files that change.
  • Identical files across multiple clients will be stored multiple times. This can be dealt with by post-event file deduplication, but this is not optimal and with large numbers of files, takes a significant amount of time.
  • Identical blocks will be stored multiple times across all files and all clients.
  • If configured to use reverse librsync deltas to save storage space, restore times for large files can become long, because each delta has to be applied and the large file regenerated for each change in the sequence.

    amanda (da Silva et al, 1991):
    Server contacts each client to perform a backup at a scheduled time. Has a native Windows client. On Unix-like systems, it uses native tools to make the backup, like tar.

    Advantages over burp-1.3.36:

  • Has a long history, and therefore should be expected to be stable.
  • No hard link farm.

    Disadvantages over burp-1.3.36:

  • Like bacula, it is clearly focused on tape storage and pretends that a file on the disk is actually a tape, which leads to inefficiencies that don't need to exist when your medium has random access.
  • Hard to configure.

    backshift (Stromberg, 2010):
    Uses variable-length blocks to perform inline deduplication. Stores the chunks on disk in a directory structure named after the checksums it creates. It appears to use the file system as its full chunk index.
    Appears to be designed with local backups in mind, rather than networked.

    Advantages over burp-1.3.36:

  • Inline deduplication.

    Disadvantages over burp-1.3.36:

  • No Windows API support. Open files cannot be backed up.
  • No central management, or scheduling.
  • Poor network support. Cannot run over the network with ssh. Instead, it needs to mount a network share (smbfs, sshfs, etc) and read the data from that. Suspect that all the data is transferred every time.
    On investigation, http://stromberg.dnsalias.org/~strombrg/backshift/documentation/for-all/backing-up.html, says "writing to a remote filesystem is faster than reading from one - if you have the choice", so pushing probably means that not all the data is transferred every time. I will push when I test it.
  • Running on Windows requires installation of cygwin and python, or for the Windows filesystem to be mounted on the server via a network filesystem, which means that the backup has to be pulled.
  • Will suffer from disk deduplication bottleneck issues due to the full index created on the disk.

    backuppc (Barratt, 2001):
    Uses no client side software, instead backs up clients using network shares and tar or rsync on the server.
    Has a 'pool based' deduplication mechanism to deduplicate across multiple clients. This apparently operates at a file level granularity, not block level, and is therefore not suitable for backups of disk images. The 'pool' appears to operate as a hard link farm.

    Advantages over burp-1.3.36:

  • Claims to be 'enterprise grade'.

    Disadvantages over burp-1.3.36:

  • No Windows API support. Open files cannot be backed up.

    bacula (Sibbald, 2010):
    The original burp was based on bacula-5.0.3, so it contains many of the best features of bacula, whilst solving many of its problems.
    Instead of being of a client/server architecture, bacula has four main components - the director, the file daemon, the storage daemon, and the catalog.

    Advantages over burp-1.3.36:

  • Since bacula tries to emulate a tape drive when it saves to disk, it stores the data in a tar-like format containing the data for many files. This means that there are no issues with management of large hard link or mirror farms. However, it does have other issues related to the way it tries to view these storage files as tapes.
  • Claims to be 'enterprise grade'.

    Disadvantages over burp-1.3.36:

  • Complexity to configure - each of the main components has their own set of configuration files, for example.
  • Although it avoids problems with hard link farms, it still works badly with disk storage - Bacula's mentality is very highly geared towards tape usage and therefore it works poorly with disks.
  • Stores the catalog separately to the backups - This causes a massive maintenance headache. For example, you now have to think about backups of your catalog. Additionally, changes to your configuration files might not take effect because some of the previous configuration gets written to the catalog, and then it is not easy to make the changes take effect. Furthermore, you end up needing to be a mysql or postgres database expert.
  • Backs up the whole file even if only a few bytes in it have changed.
  • Relies far too heavily on clock accuracy - Bacula goes very badly wrong if your computer's clock somehow gets skewed. In fact, it relies so heavily on the clock and timestamps that it does not actually track which backup another was based on.
  • Laptop backups are difficult to schedule.
  • Cannot resume an interrupted backup.
  • Retention configuration - my experience taught me that it is just impossible to configure a sensible retention policy for bacula. The reasons why are too long to go into here, but my post to the bacula email user list on the subject can be found at: http://adsm.org//lists/html/Bacula-users/2011-01/msg00308.html
  • No Windows EFS support - EFS files are silently ignored.
  • Has a commercial edition, into which most new features go.

    bup (Pennarun, 2010):
    Uses the versioning control system 'git' as its backend. Reads data directly on standard input, splits it into chunks using a rolling checksum, and packs it directly into git packfiles. Before writing a chunk, it first deduplicates using previously written packfiles. Can be run securely over the network with ssh.

    Advantages over burp-1.3.36:

  • Inline deduplication.
  • Efficient at backing up large files, such as huge disk images.
  • No need to apply deltas when restoring old versions of files, as each chunk is retrieved directly from storage when needed.

    Disadvantages over burp-1.3.36:

  • Immature meta data support.
  • No Windows API support. Open files cannot be backed up.
  • Running on Windows requires installation of cygwin.
  • Cannot prune away old backups.
  • No central management, or scheduling.
  • Performing a backup stores some data on the client.

    obnam (Wirzenius, 2007):
    Uses ssh to transfer data. Seems to be able to do inline deduplication.

    Advantages over burp-1.3.36:

  • Inline deduplication.

    Disadvantages over burp-1.3.36:

  • No Windows API support. Open files cannot be backed up.
  • Running on Windows requires installation of cygwin.
  • No central management.

    rdiff-backup (Escoto et al, 2001):
    Backs up one directory to another, possibly over a network.
    Like burp, it uses librsync. It creates a mirror, and reverse diffs are created so that previous versions of files can be restored.


    Disadvantages over burp-1.3.36:

  • No Windows API support. Open files cannot be backed up.
  • Running on Windows requires installation of cygwin.
  • No central management, or scheduling.

    rsync (Tridgell et al, 1996) --link-dest wrappers:
    The rsync --link-dest functionality is used as the back end of various backup scripts, such as rsnapshot.
    A set of files is backed up as a mirror. In subsequent backups, files that have not changed are hard linked to the entries in the previous backup in order to save disk space. Or, in other words, you have a hard link farm.

    Disadvantages over burp-1.3.36:

  • No Windows API support. Open files cannot be backed up.
  • Running on Windows requires installation of cygwin.
  • No central management, or scheduling.

    tar over ssh (GNU, 1999):
    The name 'tar' is derived from 'tape archive'. is a utility that combines files and meta data into a single stream. Running it over ssh ('ssecure shell') means that the stream can be sent across the network securely. I am including this as a backup option because it gives me some sort of scientific control. Each backup that this method makes will be self contained, not relying on any other backup, so the network utilisation and storage will be consistent for each run.

    Advantages over burp-1.3.6:

  • These tools are ubiquitous over Unix-like operating systems.

    Disadvantages over burp-1.3.6:

  • No Windows API support. Open files cannot be backed up.
  • Windows does not come with open source programs like tar, or ssh. A linux-like environment needs to be installed with cygwin in which to run them.
  • No central management, or scheduling.
  • All the data is transferred each time.
  • Massive redundancy of stored data. Will take up a lot of disk space. Note: After performing the tests, I discovered that tar has the ability to do incremental backups, and amanda uses this capability. Therefore, amanda's backup results can be seen as analagous to the results of tar's incremental backups had I run tests on tar in that mode.

    urbackup (Raiber, 2011):
    Client/server backup system. File and image backups are made while the system is running without interrupting current processes. Also continuously watches directories that you want backed up, in order to quickly find differences to previous backups. Has a native Windows client.

    Advantages over burp-1.3.36:

  • Image backups of Windows (but not Unix-style systems)
  • Has an interesting method of broadcasting on the LAN in order to find clients to back up.

    Disadvantages over burp-1.3.36:

  • Windows or posix ACLs, alternate data streams or permissions are not backed up during a file backup.
  • Poor linux support - the online manual states "the client software currently runs only on Windows while the server software runs on both Linux and Windows". However, at the time of writing, I did find source for a Linux client.
  • Has an underdeveloped command line interface. It is impossible to restore from the command line, or to trigger a backup. Files can only be restored one at a time from its web interface. Consequently, I am not able to test this software properly.

    < Prev Contents Next >
  • Donate with Bitcoin

    Burp is open and free software. I work on it in my spare time. If you would like this work to continue, please consider making a small donation.


    Burp, don't suck. Last updated: June 2016
    By Graham Keeling
    Hosted by 6sync