BURP - BackUp and Restore Program

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

5. Comparative testing

Once the first iteration was basically working, I designed a procedure with which to test the various backup solutions.

The testing was done using a Linux server and client, because this greatly simplifies setup and the measurement of resource utilisation. The machines were connected to each other using a 100Mb/s switch.

Server

CPU:	Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (double core)
RAM	4GB:
OS	Linux version 3.2.0-3-amd64 (Debian 3.2.21-3)
Disk 1:	ATA WDC WD400BB-00JH 05.0 PQ (40GB - for the OS)
Disk 2:	ATA ST3400620A 3.AA PQ (400GB - for the storage)

Client

CPU:	Intel(R) Atom(TM) CPU D510 @ 1.66GHz (quad core)
RAM:	4GB
OS:	Linux version 3.2.0-4-amd64 (Debian 3.2.46-1+deb7u1)
Disk 1:	ATA SAMSUNG HD501LJ CR10 PQ: 0 ANSI: 5 (500GB)

There were two test sequences for each backup software. In both cases, files were initially copied into a directory on the client computer for the purposes of being backed up.
a) Many small files - I downloaded 59 different linux kernel source packages from http://kernel.org/ and unpacked them.
This resulted in 1535717 files and directories, and 20GB (20001048kb) of data, which is an average of about 13kb per file.
b) One large file - I used a 22GB VirtualBox VDI image file of a Windows 7 machine. I took one copy of this, started and stopped the Windows 7 virtual machine then took another copy of the file, which was now changed.

Each sequence had the following steps, each of which is targetting potential weaknesses of backup software. For example, updating the timestamp of a large file could cause the whole file to be copied across the network even though none of the data has changed.

Perform a backup.
Perform a backup without changing anything.
Perform a backup after changing some of the data.
For the small files, I randomly scrambled the files in one of the kernel directories.
For the large file, I used the rebooted VDI image.
Perform a backup after updating the timestamp on some of the files.
For the small files, I updated all of the timestamps in one of the kernel directories without changing the data.
For the large file, I updated its timestamp without changing its data.
Perform a backup after renaming some of the files.
For the small files, I created a new directory and moved half of the kernel sources into it.
Perform a backup after deleting some of the files.
For the small files, I deleted half of them.
For the large file, I truncated it to 11GB.
Restore all the files from each of the six backups.

These measurements were taken for each backup or restore:

The time taken.
The cumulative disk space used after each backupe
The cumulative number of file system nodes used by the backup.
Bytes sent over the network from the server.
Bytes sent over the network from the client.
Maximum memory usage on the server.
Maximum memory usage on the client.

I would also have liked to measure the CPU utilisation, but I was not able to find a satisfactory practical way to do this for each piece of software.

To get the time taken and the memory usage statistics, I used the GNU 'time' program. To get the disk space statistics, I used the 'du' command. To get the number of file system nodes, I used the 'find' command, piped to 'wc'. To get the network statistics, I was able to use the linux firewall, 'iptables', to count the bytes going to and from particular TCP ports.

Immediately before each test, I would reset the firewall counters and flush the disk cache on both server and client.

Since each test sequence might take a lot of time, scripts were written to automate the testing process so that they could be run without intervention. The scripts had to be customised for each backup software under test. These scripts are included with the software as part of the submitted materials for this project.

I ensured that each software was configured with the same features for each test; there would be no compression on the network or in the storage, and there would be encryption on the network but not in the storage. For those software that had no native network support, this meant running the software over secure shell (ssh).
During the initial testing, I discovered that burp's network library was automatically compressing on the network, leading to incorrect results. I made a one line patch to the versions of burp under test in order to turn off the network compression. This will become a configurable option in future versions. I redid the initial testing of both versions of burp for this reason.

These are the candidate software to be initially tested. For more verbose information on each of these, see the Bibliography and Appendix F.
burp-1.3.36 was the latest version of the original burp at the time the testing was started.
burp-2.0.0 is the first iteration of the new software.

Software	Good cross platform support	Native network support	Good attribute support	Good imaging support	Notifications and scheduling	Retention periods	Resume on interrupt	Hard link farm	Inline deduplication
amanda 3.3.1	No	No	Yes	No	Yes *	Yes	No	No	No
backshift 1.20	No	No	Yes	No	No *	Yes	Yes	No	Yes
backuppc 3.2.1	No	No	Yes	No	Yes	Yes	Yes	Yes	No
bacula 5.2.13	Yes	Yes	Yes	No	Yes	Yes	No	No	No
bup-0.25	No	No	No	Yes	No *	No	Yes	No	Yes
obnam 1.1	No	No	Yes	Yes	No *	Yes	Yes	No	Yes
rdiff-backup 1.2.8	No	No	Yes	No	No *	No *	No	Yes	No
rsync 3.0.9 --link-dest	No	Yes	Yes	No	No *	No *	Yes	Yes	No
tar 1.26	No	No	Yes	No	No *	No *	No	No	No
urbackup-1.2.4	No	Yes	Yes	Yes	Yes	Yes	Yes	No	Yes
burp 1.3.36	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes	No
burp 2.0.0/1	Yes	Yes	Yes	Yes	Yes	No	Yes	No	Yes

* Possible with management via external tools, such as cron.

'Good cross platform support' means that the software is able to back up and restore a set of files on Unix/Linux/Mac/Windows computers, and is able to use the Windows backup API.
'Good imaging support' means that the software is able to perform image backups and restores efficiently.
'Hard link farm' means that the software saves its data in individual files, one for each file that is backed up, and may hard link unchanged versions together. This can be beneficial on small backup systems, and you can copy the files for restore using standard file system tools. However, with large backup sets, they become unwieldy due to file system overhead.

5.1. Conclusions from feature comparison

When comparing the feature lists prior to doing any testing, it seems that the original burp is already one of the best choices in the field, and some of its deficiencies are addressed by the newly developed software.

For example, the only other software with good cross platform support is bacula. None of the solutions offering inline deduplication except the newly developed software manage to have good cross platform support.
I expected urbackup (Raiber, 2011) to be a good contender, but it turns out that it doesn't work well on Linux, as described in the next section.

Many of the technically interesting offerings, such as bup-0.25 (Pennarun, 2010), lack features that help with simplifying administration of a central server. A few of them, for example backshift (Stromberg, 2012) and obnam (Wirzenius, 2007), are not really network based solutions and require remote filesystems to be externally mounted so that they appear to the software as local filesystems. These suffer appropriately in the testing that follows.

You may have noticed that the newly developed software has lost the ability to delete old backups. This will be addressed in a future iteration beyond the scope of this project, but the planned concept is explained in a following chapter about further iterations.

< Prev Contents Next >

Burp is open and free software. I work on it in my spare time. If you would like this work to continue, please consider making a small donation.

Burp, don't suck.

Last updated: June 2016
By Graham Keeling
Hosted by 6sync

Improved Open Source Backup: Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

5. Comparative testing

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions