Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions
G. P. E. Keeling
< Prev
Contents
Next >
10. Conclusion
The main aim of the project was to improve upon the existing 'burp' software
to produce the basis of a new open source cross platform network backup
software that is demonstrably superior to others in the same field.
As explained in a previous chapter, research on the software available showed
that the original burp (and hence the new software), was one of the strongest
in terms of supported features.
The new software was then shown under test
to be either superior or comparable to its 'competition' in terms of backup
speed, storage space, storage nodes, and network utilisation.
It only performed satisfactorily in terms of restore speed, although there
are ideas for improving this in the future.
Upon the implementation of sparse indexing, server memory usage was either
satisfactory or comparable to the best of the other solutions.
Client memory usage is a potential issue, with the new software performing
worse than other software. However, experiments to reduce the size of the
client block buffer will be performed for future iterations, with the
expectation that the client memory usage can be brought down to a level
consistent with other software. This expectation was shown using simple
calculations based on average block size and the existing test results.
When taking into consideration both the test results and the feature
comparison, I have no doubt that the new software will be a better choice than
the other options currently available.
It should be noted that the new software is not quite ready for release at
the time of writing.
Appendix H reproduces a list of the remaining work to be
done over the next few months before its first official release.
10.1. Answering the research questions
The first research question asked whether incorporating inline deduplication
techniques can improve open source backup offerings and produce an empirically
superior product.
This paper has demonstrated that it can.
The second research question asked whether sparse indexing is an effective
solution to the disk deduplication bottleneck.
It was noted that another software solution, backshift,
does lookups on a full index located on disk. And it took longer than two days
to complete its first backup.
The design of the new software never actually referenced the full index on
disk while chunks of data were incoming. The first iteration loaded the full
index into memory at the start, then did the lookups in RAM. This was shown
in the high results for server memory usage during backing up.
Upon implementing sparse indexing, it was shown that server memory usage
of the new software when backing up was effectively limited without notably
reducing the effectiveness of the deduplication.
Therefore, I am confident in claiming that sparse indexing is an effective
solution to the disk deduplication bottleneck.
< Prev
Contents
Next >
|