rdiff-backup isn't Perfect

Posted by JD 10/02/2010 at 09:58

I like rdiff-backup to backup your HOME directories and Virtual Machines efficiently. Ok, that is a little understated, I LOVE rdiff-backup.

So, every 6 months or so, when it lets me down in some way, I have to recall all the good things that is actually does solve. Things like:

  • Efficient backup times; just a few minutes to backup entire virtual machines
  • Efficient backup storage; about 10% more storage for 30 days of backups than a mirror (rsync) uses.
  • Easy recovery for 1 file from the latest backup (really it is trivial, just a file copy)
  • Easy recovery of a complete backup set from X days ago (I’ve tested this more than I like)
  • Easy to get information about backup increments and sizes for each.
  • FLOSS software from GNU (not commercial)
  • Backup failures get rolled back – you always get a complete, clean set.
  • No screwing around with SQL databases or other less than easy to understand crap.

Corruption Issues

So, when I do have an issue like the last few days – not caused by rdiff-backup, but rdiff-backup was impacted by outside corruption – sometimes there is nothing to do with a corrupted backup area besides wiping it and starting over. The corruption was caused by rsync’ing to an external USB disk while the VM was still running and I left it for a few days unattended and other backup tasks ran. 100% my fault.

The Fix

A backup area for each virtual machine is under /backups/{VM} so to start over, but retain the previously good backups, I simply move /backups/dms to /backups/dms-2010.10.02 then mkdir /backups/dms. Then running my normal backup task for the “dms” VM. In 30 days, I need to delete the date-tagged directory – simple – at does that for me.

# echo "rm -rf /backups/dms-2010.10.02 "| at now + 31 days

Done.

The backup job took a little longer than normal since it was a full copy this time. Tomorrow and every day after it will be under 2 minutes and the incremental storage needed will be just a few MB. This is the first corruption that I’ve seen in any rdiff-backup area in over 2 years of daily use. I do not plan on changing to some other backup method with that type of record.

If you are in an enterprise, take a look at your backup success rates. I recall looking at some for a project with 60+ servers and seeing that about 4 times a week, some part of the very-expensive-commercial-backup failed on multiple servers. Basically, I could never trust that on any day there were actually good backups on all the servers involved. Yep, we’ll be staying with rdiff-backup.

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=809