BURP - BackUp and Restore Program

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

11. Evaluation of the project as a whole

In order to produce the new software, I initially produced a list of objectives. During the span of the time allocated to this project, I was able to achieve all of the objectives, as follows.

Core: Complete a search and review of popular existing open source network backup solutions, explaining the areas in which current backup offerings are flawed.
This was done early on in the project, and the results can be read in 'Appendix F - Open source competition'.

Core: Complete a literature search and review of relevant algorithms and techniques that will be needed to implement a new software engine.
This was also done early on in the project, and the information learnt was used in the design of the new software engine.

Core: Design and develop the new software.
The design was fairly quick, but the actual development took a significant amount of time, including proof of concept programs and iterations of the final software. The iteration used for the test results was completed with around a month of time remaining before the project deadline.

Advanced: By conducting suitable tests and analysis, prove that the new software is superior to other offerings.
The new software was tested and shown to be either comparable or superior to other solutions in most areas, with the exception of client memory usage. This will be addressed before the initial release of the new software.

Advanced: Demonstrate that sparse indexing is an effective solution to the disk deduplication bottleneck.
Another deduplicating solution (backshift) taking longer than two days to complete one backup demonstrated the bottleneck problem.
Analysis of the new software's results for server memory usage, disk space and the time taken for backing up showed that sparse indexing was an effective solution.

Core: Complete the final project report.
The final project report was completed before the deadline.

Overall, I was pleased with the way that the project went. I put a lot of my spare time into this, and with a little more work the result has the potential to be useful to many people.

The two hardest parts of the project were implementing the multiple streams of asynchronous I/O, and the testing of the various software.

The former would have been made slightly simpler if I had left the file system scan part as a separate phase before backing up the data, which would then only require two streams in each direction instead of three.

Testing the various backup solutions was hard because of the amount of time involved with it. I would very often run a test sequence over a few days, and find at the end that I would have to run it again for some reason. For example, network compression may have been turned on and so the results could not be compared fairly against other solutions that were not doing network compression.

Fortunately, as mentioned in the Intermediate Project Report, I began the testing ahead of schedule and was therefore able to resolve these issues and complete the testing on time.

The decision to continue to code in C was the correct one, as I think I would not have finished the iterations in time had I tried to move to C++. However, the object-orientated style of coding that I am moving towards seems effective enough.

In the end, I find it surprising that the ideas utilised during this project have not already been combined in open source software. Perhaps a possible explanation for this is that the hard work that is involved with the development of this kind of software causes the authors to wish to sell the results. I do not intend to do that, but I can see a potential route to remuneration via providing support for the software instead. That is, if it really does turn out to be useful to other people.
It would be fantastic to work on this software full time.

< Prev Contents Next >

Burp is open and free software. I work on it in my spare time. If you would like this work to continue, please consider making a small donation.

Burp, don't suck.

Last updated: June 2016
By Graham Keeling
Hosted by 6sync

Improved Open Source Backup: Incorporating inline deduplication and sparse indexing solutions

G. P. E. Keeling

11. Evaluation of the project as a whole

Improved Open Source Backup:
Incorporating inline deduplication and sparse indexing solutions