Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions
G. P. E. Keeling
This paper investigates
whether incorporating inline deduplication techniques can improve open source
backup offerings, and whether 'sparse indexing' is an effective solution to the
disk deduplication bottleneck.
Problems with an existing open source cross-platform network backup solution
along with how it compares to other open source backup solutions,
and a literature search and review of relevant algorithms and techniques was
A new backup engine was designed and implemented, using
Agile methodology and ideas garnered from the research.
A test procedure was produced and executed, and the results empirically show
that the new software is either superior or comparable to its 'competition'
in terms of speed, storage space, storage nodes, and network utilisation.
These improvements came at the cost of increased memory utilisation, which was
then partially mitigated by implementing 'sparse indexing'.
There are some suggestions for further improvement and extra work to be done
before an initial release.
The paper ends with some conclusions and an evaluation of the project as a