Rdiff-backup vs Duplicati on Windows 2

Posted by JD 01/08/2011 at 10:52

I like backups. I like them more since losing many, many GBs of data over a decade ago – before I got backup religion.

Many of the long term readers know that I’m always looking for a better backup method.

I’ve been using rdiff-backup for about 3 years on Linux systems and mostly like it, but it isn’t perfect. Yesterday, I decided to check out a new way to backup my Windows7 laptop, Duplicati. I’d seen a few GUI tools for Windows that use the back end Duplicity tool. I’d always been interested in Duplicity because it does things that many other free tools do not. Things like encryption and networked backups to lots of services (Amazon S3) or just over ssh/sftp.

Keep reading for more on the different experience with Duplicati vs rdiff-backup.

Windows7 x64

That’s what this machine runs. It has a 500GB internal disk that I’ve partitioned in this way. . That data drive of 244GB is also encrypted with TrueCrypt. I’d intended to simply dd the entire partition to another drive. Alas, I was unable to mount the resulting partition from the other disk, which makes that a not so brilliant idea in my book. Live and learn.

I have a new eSATA dual port dock for this machine. I like it. It is fast, just like an internal disk in performance and supports hot swapping with my laptop eSATA port. Unexpected, but very nice. There’s a 320GB disk docked now with a 244GB partition ready for backup data. Let’s get started.

Duplicati

Duplicati version 1.2b is a GUI for Duplicity, a backup tool with lots and lots of features. I downloaded a native Win-x64 version. It comes as an MSI installer. Installed it into Windows7 and started it up. It starts as a wizard and you select which task you’d like to accomplish. There is no main window with a menu, it is just this task-based wizard. Seems it wants to know about backup sets. Look here if you’d like to see the screen shots of the wizard I want

  • to backup my entire D: drive (encrypted, yet mounted disk).
  • certain files to be ignored, but everything else backed up. Files to be ignored are: *.wtv, *.dvr-ms, *.mpg. *.flv. Simple. These are usually large media files from TV recordings or downloads. Not important enough to bother wasting disk storage in a backup.
  • incremental backups. Daily backups that are full will just infect the backup area when a virus comes. Not good. With incremental backups, just the files that change get infected.
  • encrypted backups, either inside the backup tool or I can just TrueCrypt the volume like I did in this case. I like TrueCrypt.
  • to be able to restore a single file with a copy command. Being able to traverse the entire directory structure in a backup is nice. Very nice. It feels good.

The source drive has about 100GB of data to be backed up. Some of that data is tiny script files and other files are 30GB of a Windows virtual machine. The dd trial performed earlier took about 45 minutes to mirror the partition on the same hardware.

I had intended to tell you how this all ended here, but 8.5 hours later and Duplicati is still running. There are over 12,500 items in the backup folder for Duplicati. I can’t tell what any of the files are except that they come in pairs – content and signatures – and the content files are about 10M in size. See below for sample file names. With 12+K of them, I won’t be listing them all here. ;)

duplicati-full-content.20110108T020109Z.vol1.zip
duplicati-full-content.20110108T020109Z.vol10.zip
duplicati-full-content.20110108T020109Z.vol100.zip
duplicati-full-content.20110108T020109Z.vol1000.zip
duplicati-full-content.20110108T020109Z.vol1001.zip
duplicati-full-content.20110108T020109Z.vol1002.zip
duplicati-full-content.20110108T020109Z.vol1003.zip
duplicati-full-content.20110108T020109Z.vol1004.zip
duplicati-full-content.20110108T020109Z.vol1005.zip
duplicati-full-content.20110108T020109Z.vol1006.zip
duplicati-full-content.20110108T020109Z.vol1007.zip
duplicati-full-content.20110108T020109Z.vol1008.zip
duplicati-full-content.20110108T020109Z.vol1009.zip
duplicati-full-content.20110108T020109Z.vol101.zip
duplicati-full-content.20110108T020109Z.vol1010.zip
duplicati-full-content.20110108T020109Z.vol1011.zip
duplicati-full-content.20110108T020109Z.vol1012.zip
duplicati-full-content.20110108T020109Z.vol1013.zip
duplicati-full-content.20110108T020109Z.vol1014.zip
duplicati-full-content.20110108T020109Z.vol1015.zip
duplicati-full-content.20110108T020109Z.vol1016.zip
duplicati-full-content.20110108T020109Z.vol1017.zip
duplicati-full-content.20110108T020109Z.vol1018.zip
duplicati-full-content.20110108T020109Z.vol1019.zip

Nice. Er… not.
8.5 hours for 100GB is a pretty long backup window. Ouch. Duplicati cannot possibly be a reference to Ducati racing motocycles, I hope.

rdiff-backup, my old friend

Rdiff-backup version 1.3.3 does not have a GUI. That fact will make many Windows users stop reading.
Rdiff-backup does have lots of features too.

Installation
I downloaded rdiff-backup-1.3.3-win32.zip and unzipped it into a directory. No installer was included. In that new directory is a file, Windows-README.txt that explains how to setup the program on Windows. You are probably ok with just the single program executable file, but you may need to get the MS-VC++ 9.x redistributable DLLs. I didn’t want to chance it (I don’t really install software on this machine), so I just grabbed the DLL package and unzipped them into the same directory. msvcm90.dll, msvcp90.dll, and msvcr90.dll were the file names. If you already have them on your system, you are probably fine. There are also a few HTML files with examples, a FAQ and the manual included. Those may be handy later.

On the same machine, I kicked off an rdiff-backup about an hour after Duplicati was started. Here are the actual commands used:

"C:\data\rdiff-backup\rdiff-backup.exe"  --exclude **/*.wtv --exclude **/*.mpg --exclude **/*.dvr-ms d:/   P:/Dell_1558_D-rdiff
"C:\data\rdiff-backup\rdiff-backup.exe" --remove-older-than 90D --force P:/Dell_1558_D-rdiff

I could have specified files to be included if I wanted and there are lots of other options that would scare many people away. Notice that I used UNIX-styled directory paths for the parts passed into rdiff-backup, that is important. Those aren’t important with this type of backup. If you need to deal with file permissions and/or retain special file types, then you can, but the options can get long and scary for an average Windows user. Keeping it simple at this stage will work just fine for me and I suspect most of you too.

Here’s a sample of the files/directories in the backup folder area:

$RECYCLE.BIN
CDH-2010-12-14-inv.pdf
contacts.csv
Data
ESXi-Backups
hosts
ip-to-country.csv.gz
LPI.ncd
metaname.ps1
Moms Apartment-001.jpg
Movies
PERF
rdiff-backup-data
Recorded TV
Vbox-install-3.2.12.zip
VirtualBox
ZimbraToast-5.0.20_GA_3127-5.0.3250.20.msi

Oops. I should exclude the recycle-bin and the pagefile.sys if I have one there in the future too.

BTW, that list of folders and file names matches the source folder and file names exactly, except there’s an extra folder – rdiff-backup-data/ that rdiff-backup uses for incremental data backups and metadata. Actually, it does reverse, incremental backups. That’s a nice feature. The latest backup is a mirror of the source and any changes between the latest and previous backups is stored in in compressed files inside that other folder. Rdiff-backup doesn’t have built-in encryption.

Oh, I almost forgot to mention this. rdiff-backup finished with this full backup less than an hour after it was begun. I’ll run another now with the same script just to see how quickly the incremental can be performed. Started at 6:05am. Finished at 6:07am. 2 minutes. Sweet! Here’s some details.

P:\ c:\data\rdiff-backup\rdiff-backup.exe --list-increment-sizes P:\Dell_1558_D-rdiff

        Time                       Size        Cumulative size
-----------------------------------------------------------------------------
Sat Jan 08 06:06:01 2011         90.3 GB           90.3 GB   (current mirror)
Fri Jan 07 21:29:33 2011          994 KB           90.3 GB

Not much changed that needed backing up. The storage and time to perform the incremental reflected that.

Restoring

As important as backups are, they aren’t the purpose of why we do this. Restores matter. Restores are the only thing that matter, actually.

I fired up Duplicati and requested a restore from backup. Ok, since the backup from last night was still running, I didn’t know how far I’d get. I got to the point of selecting a single file to restore into an empty folder. I pressed restore and the began reading the catalog, scanning signatures … then it disappeared and wasn’t in the list of running programs. I took a look at the target directory for the restore and damn if the requested file wasn’t there. Success. I thought the program had crashed, but I guess it just finished and closed. The restore GUI was intuitive enough that I was able to use it easily. It warned me that I wasn’t restoring to an empty folder, which was nice. Restoring on top of existing files may be desired, but you definitely want to be warned about it. Navigating down to a non-trivial file for restoration did take longer than I’d expected, but I could have performed a full restore or anywhere in between with recursive restoration at any level of the backup set easily too. Very nice. The only down side that I see is there’s no way for end-user self service.

I would fire up rdiff-backup to restore the same file, but since it is just a file on the P: drive in a mirrored folder structure, I just browsed over and copied the file, then pasted it back where I wanted it. It worked just like any other file copy. Any user could have done this themselves. Recursive restores work just as easily.

Automation – Automatic Backups

Backups need to be automatic. If they aren’t, then they won’t get performed.

Duplicati seems to be split into a service and a GUI that controls it. The GUI doesn’t need to be running for the program to be working. I like that – a lot. In the backup setup, Duplicati strongly encourages setting up automatic backups, which is a good thing. I didn’t look into that, but expect it hooks into the MS-Scheduler – the cron-like tool on MS-Windows.

rdiff-backup is just that single, non-GUI program. There isn’t anything else. If you want automatic backups, then you need to manually schedule them in MS-Scheduler. You’ll probably want a script file to control the settings and options and to backup different parts of your PC to different sub-directories in your backup areas. That is easy, but it does take time. There is a GUI for rdiff-backup, but it seems to be part of a web server. Do they really expect people to install Apache just to have a GUI to backup files with rdiff-backup? I should …. keep my mouth shut.

Other Thoughts

Windows7 has a backup and restore utility built into it. It is more like rdiff-backup than Duplicati in the way that it works, but it does have a GUI. The last backup always appears like a full backup though it takes just a few minutes to perform. Home Premium has a limitation on backups – they only work to locally attached disks (USB, eSATA), not over the network. That could matter to many people. If you have Win7 Professional or greater, then network backups are included. I can’t see why a home user would choose Duplicati or rdiff-backup over that tool. It works and it works well.

Because rdiff-backup is a full-backup with incremental changes going backwards in time, the idea of constantly rolling backups works perfectly. In the backup script above, you can see where any backups over 90 days are removed. I have not seen mention that rdiff-backup supports VSS on Windows and there is a note that large file support for the Windows version needs a patch. That could be an issue for some people. I checked a few virtual machine files and they were in the backup area and had the correct sizes (28GB and a few of just 10GB), so the binary version of the MS-Windows program seems to have been patched. rdiff-backup does support remote backups, but you’ll need to manually configure an ssh client on windows and setup key-based ssh authentication. I didn’t do that this time, but it is not trivial for a normal Windows user to accomplish. On *NIX/Linux, ssh is sorta just there and works with rdiff-backup nicely. I suspect that rdiff-backup over a network to a samba/CIFS drive would work just fine.

With Duplicati, backups are more traditional. You perform fulls as often as you like at the expense of increased storage requirements, then perform incremental backups based off the last full backup. During restore, you must reverse this to get back where you were previously. Weekly or monthly, you will want to perform another full backup to limit the number of restore steps when something goes wrong. Also, the more backup sets involved, the more likely backup corruption could get in the way of a restore. When I was going through the settings for incremental and full backups in Duplicati, I was discouraged by the presets and was unable to set the custom settings for my liking (I didn’t read anything or look for any help either). Duplicati appears to have network backup support built-in even on the MS-Windows version. That could be important if you are running Win7-Home or WinXP where the built-in backup tool really sucks.

There’s something about seeing Duplicati backup files. They have long names that only a catalog will like, not a human. Since the content is all chunked into 10MB sizes, transmission to remote systems would be efficient, nearly ideal. VSS is listed as supported, so backing up those open files on Windows should work, but you need to enable it manually, it is not on by default.

Ah – it appears the* Duplicati backup from last night finished. 8.5 hours later.* Hooray! Time for an incremental backup test. Started at 6:36a. It is half an hour later and it just finished after using 100% of a CPU (quad core here). 6 files were created. One of those was 10MB in size. Recall that the incremental for rdiff-backup was less than 1MB in size.

Conclusions

  1. Use the built-in Windows7 Backup Tool if you can. It is fast, efficient and works well.
  2. Use the Duplicati tool if you need remote backups with encryption and can stand very long backup windows. I can only imagine how long a backup over the network would take.
  3. Rdiff-backup probably doesn’t have a place on Windows unless you also use it on Linux/UNIX systems and like it there. I do. rdiff-backup is faster than any of the other 2 choices – much faster.
  4. If you are on Linux and only Linux, check out Back-In-Time for really simple backups and rdiff-backup if you want more features that can work cross platform.
  5. If you are a business with more than a few machines, I don’t think any of these tools will fit you. Feel free to contact me for other options. I’d like to help.

I may have a look at some other FLOSS backup tools for Windows in the future. Just be aware that none of the backup tools is perfect, even commercial versions. I’ve seen where the main leader in network backups for enterprises had some kind of failure about 30% of the time across hundreds of servers due to various reasons. Sure some of those failures were due to open database files. The DB was backed up outside this backup method, so those failures weren’t important. Still, the first time you see that in the daily status overview page and don’t expect it is always a surprise.

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=927

  1. JD 01/08/2011 at 11:58

    So, with those large virtual machine files, you should be concerned about the reverse difference actually working. I started a VM and used it for about 3 hours then ran another rdiff-backup. The results?

    The file that I knew changed was just under 10GB on the source side. I is a full disk image, preallocated using VirtualBox, not compressed. It performs well as a virtual machine.

    The rdiff-backup took about 22 minutes to complete.
    The difference in incremental backup size was about 80MB.

    C:\Users\jdp>\data\rdiff-backup\rdiff-backup.exe --list-increment-sizes p:\Dell_1558_D-rdiff
            Time                       Size        Cumulative size
    -----------------------------------------------------------------------------
    Sat Jan 08 11:00:41 2011         90.3 GB           90.3 GB   (current mirror)
    Sat Jan 08 06:06:01 2011         76.0 MB           90.4 GB
    Fri Jan 07 21:29:33 2011          994 KB           90.4 GB

    That means that even though the file was 10GB and had changed, rdiff had figured out which sectors had been changed and only created diffs with those. FAN-TAS-TIC!

    20 minutes for a backup of 80MB is a long time, so I won’t be using this method to backup my virtual servers, but for a desktop that only needs a full backup weekly, I can easily live with that time.

  2. JD 01/15/2011 at 05:05

    Another week, another backup.

    "c:\data\rdiff-backup\rdiff-backup.exe" --list-increment-sizes P:/Dell_1558_D-rdiff
            Time                       Size        Cumulative size
    -----------------------------------------------------------------------------
    Sat Jan 15 03:32:22 2011         91.7 GB           91.7 GB   (current mirror)
    Sat Jan 08 11:00:41 2011          832 MB           92.5 GB
    Sat Jan 08 06:06:01 2011         76.0 MB           92.6 GB
    Fri Jan 07 21:29:33 2011          994 KB           92.6 GB

    Just the changes please. It appears to be working.