File Copy Performance for Large Files 3

Posted by JD 10/27/2010 at 18:36

The last few days, I’ve been trying to improve the manner that I copy large (2+GB) files around both locally and between systems.

I looked at 4 different programs and captured very simple stats using the time command. The programs tested were:

  1. cp
  2. scp
  3. rsync
  4. bigsync

I’d considered trying other programs like BigSync, but really wanted something at supported incremental backups to the same file and handled it without too much complexity. I would have liked to use zsync, but my understanding is that is an HTTP protocol and can’t be used for local copies. I wasn’t interested in setting up apache with SSL between internal servers.

How-To KNOW that you have Good System Backups

Posted by JD 10/23/2010 at 09:26

Here’s a simple one question test for whether you have good backups or not.

Question: If any of your main hard disks started making a loud clicking sound right now does that idea freak you out or make you nervous?

If you have any answer beside, “No, bring it on” then your backups aren’t good enough.

Simple. I KNOW that I can wipe my HOME directory from my main system and be completely fine. There is the backup from last night on another machine that I can restore easily. If I really need access, those files are available while on the other machine too. Further, there are 90 days of incremental backups available, so if I delete something important and don’t miss it for a few weeks, I can still get it back. Honestly, I’m less confident about some other system backups, but my main desktop computer and all the company server machines don’t cause me any worry at all. I’m 100% confident. Sure, it could be a hassle, but a few hours later, the data would be back. That’s the point of backups, right? Sometimes, about 2 times a year, one of my system backups fail or get corrupted in some way. As long as that doesn’t happen on the same night that the source system has a failure, I’m fine.

For really important data, there are multiple copies on multiple systems, so even if there is some corruption, there are other copies available. Worst case, I could loose 2 days of data, but not everything. I’d restore the OS, applications, application settings AND the data. Because we use virtualization, we aren’t tied to specific hardware … pretty much any current machine can be used to restore onto. There’s no need to search for a specific RAID controller or motherboard or … whatever. Virtualization frees us from that stuff.

Of course, much of my confidence comes from actually performing restores and seeing them work. While we all say to practice the restore, most people don’t have a spare machine to try it out. I know we don’t, but every once in a while, an accident happens and a restore is the quickest answer.

Shouldn’t you be that confident about your backups too?

21 of the Best Free Linux Backup Tools – but this list doesn’t include my favorite, rdiff-backup. Sniff, sniff.
Lifehacker Backup – For anyone running Windows7, just use the built-in backup tool. It is very good and behaves much like rdiff-backup. For Windows Servers, open your wallet and check out Netbackup or EMC Networker. For VMware backups (ESX/ESXi), Trilead VMX is fairly inexpensive as far as VM backup tools go, but it doesn’t support incremental backups.

rdiff-backup isn't Perfect

Posted by JD 10/02/2010 at 09:58

I like rdiff-backup to backup your HOME directories and Virtual Machines efficiently. Ok, that is a little understated, I LOVE rdiff-backup.

So, every 6 months or so, when it lets me down in some way, I have to recall all the good things that is actually does solve. Things like:

  • Efficient backup times; just a few minutes to backup entire virtual machines
  • Efficient backup storage; about 10% more storage for 30 days of backups than a mirror (rsync) uses.
  • Easy recovery for 1 file from the latest backup (really it is trivial, just a file copy)
  • Easy recovery of a complete backup set from X days ago (I’ve tested this more than I like)
  • Easy to get information about backup increments and sizes for each.
  • FLOSS software from GNU (not commercial)
  • Backup failures get rolled back – you always get a complete, clean set.
  • No screwing around with SQL databases or other less than easy to understand crap.

Top 9 _Ooops_ Moments

Posted by JD 07/27/2010 at 10:30

Below are a few incidents that I’m personally aware of which impacted a few different projects. Some are from my personal desktop to production dispatching systems with 20K+ users to some that impacted a space shuttle launch data.

People like Top 10 Lists, but I could think of only 9 near disasters. Perhaps something interesting will happen this week? ;)

Ooops – beep, beep, beep ….

Centralized vs Federated Computer Services

Posted by JD 07/16/2010 at 07:25

I came across a short article on the Free Software Foundation building a federated social network solution and figured a few of my readers would be interested.

Trilead VM Explorer Install Tips

Posted by JD 07/06/2010 at 14:04

As some of you may know, I am a consultant, primarily with UNIX, virtualization and systems architecture. The last few days, I’ve been setting up a fairly low cost backup solution for a 100% MS-Windows shop running VMWare ESX 3.×. They have 15+ VMs and the old backup system had been shutdown and wiped more than a few months ago. There didn’t seem to be anything wrong with the prior backup solution except that the day-to-day system users didn’t know much about the setup. My task was to get that system working again.

The Tools

  1. Trilead VM Explorer – the VMware compatible VMDK backup software. Not the free version.
  2. Fire Daemon Pro – to run the backup task on a schedule.
  3. MS-Batch – .CMD files – to selectively control which VMs are backed up on specific days without point-and-click requirements.
  4. Service Accounts – this is very important in the MS-Windows world.

Ooops 1

Posted by JD 05/10/2010 at 08:21

What’s that saying? Some days you eat the bear and other days the bear eats you.
Or perhaps Stupid is as stupid does fits.

Ooops is something you never want to hear your system/network admin say. I’ve heard it said elsewhere and then watched as 200+ NASA servers all started rebooting. No time to save your work. No time to do anything before the screen flickers and a bios screen is displayed.

Today, I said, “ooops.”

Buying a Laptop - Stuff To Know 3

Posted by JD 04/23/2010 at 08:54

In a prior article here, I outlined some important things to ensure when you’re looking for a new laptop. With the release of Windows7, some of those things aren’t necessarily as important as they were under Vista and I’ve learned some new things in my shopping for a new laptop myself.

Big Server OS Installs Is a Problem

Posted by JD 12/15/2009 at 08:27

Many companies don’t really consider the bloating of server operating systems as a real problem to be addressed. This is wrong because as soon as you write any data to disk, you’ve just signed up your company to safeguard that data multiple times (3-90) for the next 3-5 years, if not longer.

How did I come up with this?

Assumptions – hopefully realistic for your situation

  • Windows 2008 Server – 20GB installation for the OS only (MS says 32GB of disk is the min)
  • Data is stored on a SAN, so we will ignore it. The size of data isn’t the issue in this article.
  • Compressed and incremental backups are performed with 30 days retained.
  • At least 1 copy is maintained off site for DR

Break down of backup disk use

  • Install image – 20GB of storage
  • OS Backup – 20GB of storage
  • Off site Backup – 20GB of storage
  • 2 extra copies of backup – 40GB of storage

Total is 100GB of storage media for a single Windows 2008 Server install. Not all that bad, really. Then consider that even small businesses probably have 5 computer servers, that becomes 500GB of storage. Still not so bad. Heck, your DR plan is just to copy the last backup to an external drive and take it home every Friday. Good enough.

Now imagine you have 50 or 100 or 1000 or 20,000 installations. Now it gets tougher to deal with. Those simple backups become 25TB, 50TB, 500TB and 10PB of storage and you haven’t got anything but the OS backed up – no data.

Alternatives?

  1. Data deduplication on archive storage frames
  2. Fixed OS images – if they are all the same, you only need 1 backup
  3. Use a smaller OS image

Data Deduplication

Data Deduplication has been an expensive option that small companies with normal data requirements wouldn’t deploy due to cost, complexity and lacking skills. This is about to change with the newest Sun ZFS that should be out early 2010. It is already available in OpenSolaris, if you want to get started with trials. I’ve seen demonstrations with 90% OS deduplication. That means for every added server OS install, you only add 10% more to be backed up. Obviously, this will increase whenever a new OS or patch deployment over weeks and months occur, but this solution is compelling and will easily pay for itself with any non-trivial server infrastructure.

Fixed OS Images

This is always a good idea, but with the way that MS-Windows performs installations, files are written all over the place and registry entries are best applied only by installation tools. Configuration methods on Windows tends to be point and click, which can’t be scripted effectively.

On UNIX-like operating systems, base images can be installed, application installation scripted and overall configuration settings scripted too. There are a number of tools that make this easy, like Puppet. This is FOSS.

Use a Smaller OS

Xen Ubuntu Linux 8.04.x running a complete enterprise messaging system with over a years worth of data is under 8GB including 30 days of incremental backups. Other single purpose server disk requirements are smaller, much smaller. This blog server is 2.6GB with 30days of incremental backups. That’s almost a 10x factor smaller than MS-Windows server. Virtualization helps too. JeOS is a smaller Ubuntu OS install meant for virtual servers.

No Single Answer

There is no single answer to this problem. I doubt any company can run completely on Linux systems only. Data deduplication is becoming more and more possible for backups, but it isn’t ready for transactional, live systems. Using fixed OS images is a best practice, but many software systems demand specialized installation and settings which make this solution exponentially complex.

A hybrid solution will likely be the best for the next few years, but as customers, we need to voice our concerns over this issue with every operating system provider.

Shame on Pidgin-Plain Text Passwords 4

Posted by JD 12/01/2009 at 18:02

Today I was going through my list of files to backup on my Linux laptop and removing temporary and cache files when I came across a directory that I didn’t recognize. The files were listed as changed with the last 3 days.

changed .purple
changed .purple/accels
changed .purple/accounts.xml
changed .purple/blist.xml
changed .purple/prefs.xml
changed .purple/status.xml

It turns out they are for pidgin, the extremely popular Instant Messaging software. Ok, I use that – fine. But my interest got the best of me and I looked at the accounts.xml file. Obviously it is an XML file, but I was shocked to discover the following (modified for my protection):



prpl-jabber
admin-userid@example.com/Admin
some-really-complex-password-with-lots-of-special-characters-in-clear-text
admin

The password isn’t encrypted. Not at all!

This is unacceptable.

There is an encryption plugin for pidgin but it is for IMs, not the stupid passwords. This is just crazy. Heck, there are ROT13 methods and trivial 2-way password encrypt/decrypt methods which could be used if necessary.

The pidgin wiki has this to say. I have to admit, they do have a point, but I still disagree with it. At least they do set the directory permissions to 700 and file permissions to 600 (user only), but this doesn’t help with my backups placed on another system, does it?