BURP - BackUp and Restore Program

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

10. Conclusion

The main aim of the project was to improve upon the existing 'burp' software to produce the basis of a new open source cross platform network backup software that is demonstrably superior to others in the same field.

As explained in a previous chapter, research on the software available showed that the original burp (and hence the new software), was one of the strongest in terms of supported features.

The new software was then shown under test to be either superior or comparable to its 'competition' in terms of backup speed, storage space, storage nodes, and network utilisation.

It only performed satisfactorily in terms of restore speed, although there are ideas for improving this in the future.

Upon the implementation of sparse indexing, server memory usage was either satisfactory or comparable to the best of the other solutions.

Client memory usage is a potential issue, with the new software performing worse than other software. However, experiments to reduce the size of the client block buffer will be performed for future iterations, with the expectation that the client memory usage can be brought down to a level consistent with other software. This expectation was shown using simple calculations based on average block size and the existing test results.

When taking into consideration both the test results and the feature comparison, I have no doubt that the new software will be a better choice than the other options currently available.

It should be noted that the new software is not quite ready for release at the time of writing. Appendix H reproduces a list of the remaining work to be done over the next few months before its first official release.

10.1. Answering the research questions

The first research question asked whether incorporating inline deduplication techniques can improve open source backup offerings and produce an empirically superior product.
This paper has demonstrated that it can.

The second research question asked whether sparse indexing is an effective solution to the disk deduplication bottleneck.
It was noted that another software solution, backshift, does lookups on a full index located on disk. And it took longer than two days to complete its first backup.
The design of the new software never actually referenced the full index on disk while chunks of data were incoming. The first iteration loaded the full index into memory at the start, then did the lookups in RAM. This was shown in the high results for server memory usage during backing up.
Upon implementing sparse indexing, it was shown that server memory usage of the new software when backing up was effectively limited without notably reducing the effectiveness of the deduplication.
Therefore, I am confident in claiming that sparse indexing is an effective solution to the disk deduplication bottleneck.

< Prev Contents Next >

Burp is open and free software. I work on it in my spare time. If you would like this work to continue, please consider making a small donation.

Burp, don't suck.

Last updated: June 2016
By Graham Keeling
Hosted by 6sync

Improved Open Source Backup: Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

10. Conclusion

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions