The Borg are coming, resisting assimilation is futile and in this case welcome. Sorry, I couldn’t resist such a geeky line, and it’s not a joke either!
I’m forever juggling backups and like any OPs department the continued consumption of disk space is inevitable. For many years using traditional tape backup has been the go-to method, which for small data sets has been ideal. However increasing storage also means increasing time to grab it all.
Various methods come to help such as differential and incremental backups, but these can complicate backup and restore processes and become a complete nightmare if you have to resort to tapes. If you still use tapes. Last tapes I used were some years ago and were LTOs with a robot tape changer due to the capacity and I would never go back there.
Over the last few months I’ve been aware that the current backup methods I’m using are becoming increasingly ineffective due to the growing use of virtual machines and the huge disk images that they create. Of course frequent snapshotting of the VMs to reduce the daily backups only go so far and they introduce other issues (which I’ve covered in another entry here).
To head-off impending doom I’ve been looking around at other solutions primarily for the immediate issue of the VMs as the existing solution is adequate for the rest of the operation. The requirements are therefore quite simple:
- High compression of the disk images
- Store only changes
- Do this really quickly
You would think that should be handled well enough by the existing solution. It does a fair job but multiple huge files on a regular basis the time to compress and steam off is not great. I need something that can both speed this process and take as little space as possible and quite quickly I found a number of options which revolved around the awsome rsync tool such as:
- A custom rsync script using the hard links options. Worked well but of course the files are not compressed. Fail.
- The useful rdiff-backup, again which worked well, and compressed the differences but they won’t be that big anyway. Would have been good to backup the base files in compressed form. Fail.
- Burp Backup (http://burp.grke.org/) was fantastic and did an amazing job. Up to a point. Had some strange intermittent file access issues which left me feeling nervous about putting into production. Eventual fail.
By this point I wasn’t optimistic about resolving my dilemma. Then I found something that blew me away and that was Borg Backup (https://github.com/Borgbackup/Borg). I can’t believe how wonderful this tool is!
In essence it uses deduplication, and that is splitting files down to blocks and only storing one instance for each occurrence. This method has been ideal for the VM disk images which often contain repeated structures and on the first go knocked a 1TB requirement down (along with compression) to only 500GB of storage. OK, that sounds a lot, and if that was 500GB each time then there would be a problem. The good thing it wasn’t. Following backups were just 50-100MB because of the smaller chunks of disk image changes were recorded.
This has really made a huge difference to the storage requirements as well as the speed of the back-up cycle, so much so that its possible to do multiple backups per day. Of course I did notice in the documentation that there are warnings against using this tool for VM backups due to integrity of the disk images, but after extensive (and often destructive) testing I’ve not found any problem as far as recovery via these snapshots.
I may be late to the party regarding deduplication backups, but I’m embracing the idea, even if there are risks such as if the dedup block becomes damaged then it will affect multiple backups. This is unlike traditional backups in that they are self-contained so damage to one won’t affect others. There are therefore risks that need to be considered such as ensuring the integrity of this huge pool of data is safe from harm.
I love this backup tool now even though there is a single ‘but’: it’s a push only solution so backups are initiated via the client. In this case I can’t roll it out for other situations as that is inconvenient, though of course there are ways around this which I will look at some point.
Go take a look and welcome the Borg into your life…
Image Credit: Borg Documentation