Best Practices for Home Desktop Computer Backups 2
The Checklist
- Stable / Works Every Time
- Automatic
- Different Storage Media
- Fast
- Efficient
- Secure
- Versioned
- Offsite / Remote
- Restore Tested
When you are looking for a total backup solution, those are the things you want from it.
Stable / Works Every Time
I read a few reviews concerning backup tools under Windows. The test was to perform a backup followed by a full restore to a different disk. Then the 2 different disks, source and newly restored disk, were compared. Any difference as considered a failure. Most of the Windows tools restored file systems did not 100% match the source file system, often in very important ways.
Automatic
If any manual steps are needed, eventually, humans will stop doing them. I’ve been there and manually did backups for almost 6 months every week. Then every quarter. Then … er … It is human nature to stop. Since I started doing automatic backups, those happen every night. Sure, about 2-5 times a year the backups fail, but the other 360 times, I have a good backup. Automatically.
Different Storage Media
HDDs tend to fail completely, not just in a single partition. Writing to a completely different storage media is critical. Laptops come with a reinstallation parition, but if the entire HDD is dead, that partition is completely useless. The data, OS, and recovery partition all need to be backed up to different media. Any USB disk or networked disk share works.
Fast
If backups take 5 hours, you won’t do them. OTOH, if they require 3-5 minutes, that isn’t too much of a commitment, so you will.
Efficient
This is about storage efficiency. If you need 30 backups and it required 30x the storage, that is not efficient or practical. There are backup methods and tools that are extremely efficient such that all the data and any data that changes is efficiently compressed. I know this is possible, since I keep 90 days of backups for my personal files from a laptop. The data in my HOME is using 5.06GB of storage and the backup area with 90 days is using 6.55GB of storage. To me, that is such a minor amount of storage that it isn’t worth not having the backups for 3 months.
Secure
If the backups contain sensitive data of any type, encryption is needed. Encryption is really needed when the backups are over an untrusted network, like the internet, or if the backup media can leave the secured location. That means if you use a portable HDD for your backups, then it really needs to have encrypted storage.
Versioned
RAID and mirroring (as with rsync) are not good enough for most backups. This is because corruption of the source files immediately corrupts the 2nd copy or mirror. Corruption can happen for many reasons (hardware, controller, or software, logical errors), so having versioned file backups is critical. If you can restore a file from yesterday AND a different file from last week, then your backups are versioned.
Corruption can happen from computer viruses or rootkits or remote crackers too.
Offsite / Remote
Fire, tornado, floods, earthquakes, thieves. Enough said?
Restore Tested
Until you test your restore, you have nothing. Backups are like insurance. You use them all the time and hope you never need them, but since hardware fails all the time, each of us probably will need to restore at some point. Hope is not a plan. Having a tested, verified, validated restore process is critical.
Fine. I’m Convinced. What Software Should I Use?
The software that gets you doing backups is the software that you should use. If rsync is the tool used to create a mirror every month, that is better than no backups at all, right? I really like rsync for specific uses, but there are better tools for backups. To me, rsync is an 80% tool. It provides 80% of what most people need in a backup. I think we can do better.
The easiest tool that meets all the best practices listed is Duplicati or Duplicity. Both are F/LOSS tools with Duplicati being a GUI version of Duplicity. I will admit that I have only played with it and do not use it myself for backups. That does not detract from the capabilities in any way. It is also cross platform for Windows, Mac and Linux, so everyone can use it. Both of these tools store backups inside volume containers, so restores are slightly more complex than with other tools. You much have the software loaded to restore.
My preferred backup tool on Linux is rdiff-backup. It doesn’t meet all the best practices alone, but it doesn’t prevent us from creating a completely best practice compliant backup solution ourselves. The commands are nearly identical to rsync, so if you use rsync today, then you really owe it to check out rdiff-backup. To restore from the last backup, you do not need the rdiff-backup software, it is just a copy command. I really like that.
If you are new to Linux or perhaps you just prefer a GUI, then Back-In-Time is a tool worth checking out. To restore from the last backup, you do not need
the back-in-time software. I really like that.
Windows Backups
From a completely different perspective, here’s a Windows-centric Best Practices article that also recommends some commercial software.
I’m hardly an expert, but for Windows systems I think there are 2 types of backups needed.
- System Image (OS and installed Apps)
- Data Backups (all user files)
Image-Based Backups
This is because we need to have an exact image of the system at home thanks to the usual hardware-to-OS-install requirements for Microsoft Windows Licenses. Businesses have more flexibility, but home users generally get MS-Windows with the PC they bought and that license is tied to the specific hardware. It will not run on other hardware. If you have a retail or upgrade version of MS-Windows this restriction may not apply. I prefer to use Linux where there aren’t any license restrictions on the hardware. Sure, not all hardware is supported by every version of Linux, but at least you know it isn’t something specifically coded to prevent use.
For System Images, you can use any tool you like that does bit-for-bit copies. Some examples are:
- dd or any of the safe-dd or rescue-dd tools
- Clonezilla
- PartImage – my favorite; simply to use and understand
These are not incremental, so they require lots of storage for each run. If you create a system restore monthly, I think you are doing pretty good. I do it about once a quarter. Further, I try to only load software using Ninite installation packages, so that application maintenance is easier and reloading software is 1 installer, not 20. A few commercial Windows apps may need to be manually loaded, but that’s much less effort than manually loading 20+ apps, right? If the app didn’t change since your last system image, then there’s no need to load a new version anyway.
I also make system images before and after a service pack install. I’ve been burned before.
Incremental Data Backups
For incremental data backups, I try to push all the data on Windows to a Linux file server and use Linux backup tools. However, I’ve also used the win32 version of rdiff-backup under Windows. The last version I loaded showed issues with large files, but worked just fine for typical word processing and other similar types of files. A large file is something over 2GB. The recent reviews of Windows specific backup software seems to have issues too. Some didn’t actually backup every file or refused to restore some files, so if you are going this route, please do some testing of both the back AND the restore capabilities. Reading reviews is a really good idea too.
Get Some Backups Going Today
Even if your selected tool isn’t automatic or compress or storage efficient or encrypted, having some backups is better than having no backups. Every once in a while I come across someone using ZIP or TAR to create backups. Usually these people like something simple and easy to understand. I do too, but for about the same effort they can use a simple tool that provides so much more and can be automated and really efficient. Why wouldn’t they do that?
Optimized Backups for Physical and Virtual Machines 4
My old backup method was a little cumbersome. To ensure a good backup set, I’d take down the virtual machine, mount the VM storage on the host (Xen), then perform an rdiff-backup of the entire file system, before bringing the VM back up again. This happened daily, automatically, around 3:30am. It has been working for over 3 years with very few hiccups. I’ve had to restore entire VMs and that has worked too. One day I needed to restore the Zimbra system ASAP. From the time I decided to do the restore until end-users could make use of the system was 20 minutes. That’s pretty sweet in my book.
There are some issues with the current setup.
- Backups are performed locally, to a different physical disk before being rsync’ed to the backup server. This is necessary because the backup tool versions are different and incompatible between Ubuntu 8.04 and 10.04 LTS servers.
- Each system is completely shutdown for some period of time during the backup process. It is usually 1-4 minutes, but still that is downtime.
- Most of the systems are still using 8.04 paravirtual machines under Xen. A migration of some type is needed to a newer OSes. I should use this opportunity to make things better.
- Some of the systems are running old versions of software which are not up to current patch levels. I guess this happens in all IT shops. None of that is available outside the VPN, so the risks are pretty low.
think I can do better.
New Blog Software and OS 2
Since this is a technology blog, I figure some of you may be interested in a major change that happened out of necessity here today.
This is the very first blog article on our new physical server, running in a completely different virtual machine. For the next week, everything here is a test.
Due to some sort of outage issue earlier today, I was forced to upgrade everything involved with this blog. I had attempted to perform this upgrade previously and failed. As you can see, this time, there was success. Nobody was shocked more than I.
Easy Technique for Secure, Easy to Type Passwords - Size Matters 4
The ladies have always known that size matters. We need to apply that knowledge to passwords. Password security experts know that
- a longer password is better
- a password with as many different types of characters is important – call it a large alphabet
- a password that cannot be found through a dictionary attack
- a password that hasn’t been cracked before
These rules seem to be conflicting with the most important things from a user’s perspective. A user wants:
- a memorable password
- an easy to type / enter password
So what’s the solution? A long, but easy to type and easy to remember password. Below is how to get all 6 of these requirements, easily.
Git DVCS Server Setup and Use in a Team
It seems that all the software developers are using git DVCS these days. I haven’t done serious software development in many years, so I’ve been using RCS all this time for my system admin scripting needs. With my new development work, I need to upgrade my toolset to a DVCS – Distributed Version Control System. There are many reasons to do this even if you don’t want to publish all your code on the internet. Below I’ll show how to setup an internal git server that can be shared inside a company or just between friends on the internet.
I’ll assume:
- Your git server will be on a Linux/Unix system someplace where
- all the developers will have ssh connection access.
- You have git installed on the server and the clients already.
Those server connections may allow full shell access or be limited to support just git. Regardless, setting up ssh-keys – Ssh Config Setup – is a good idea between the client(s) and the server computers.
Keep reading to learn about Git setup.
Securing ssh Connections and Blocking Failures
If you have an ssh server running on your network that is accessible to the outside world, on the internet, chances are your systems are being attacked. If you aren’t aware of this, just take a look at your ssh logs in /var/log/auth.
$ egrep -i Failed /var/log/auth.log*
We can do better from a security standpoint. Regardless, ssh definitely still rocks and should be used daily, constantly. Before I moved ssh to a higher, non-standard, port and install Fail2Ban, I was seeing over 1,000 ssh attempts daily in the log files. What’s the saying … ignorance is bliss? Not when it comes to systems security.
This article is for Linux/UNIX users, but the ideas should apply to any OS running an ssh daemon.
Ssh Setup For Higher Security
The order below based on how easy it is to accomplish or setup. None of these configuration changes are hard. All of them can be accomplished in under 5 minutes if you know what you’re doing or 15 minutes if you need to read up a little.
- Listen on a non-standard port
- Use ssh-key-based connections
- No remote root logins with a password
- Allow only key-based logins from non-LAN IPs (basically any remote ssh connection cannot use a password)
- Lock account after X failed attempts – Fail2Ban
- Automatically block IPs with login failures – Fail2Ban
- Monitor hack attempts – Fail2Ban
Googlebot Random HTTP-GET Requests 1
I was going through the blog server logs today looking for odd, unexpected requests. Attended a Linux Security Meeting last evening that has me thinking … I see all the normal myphpadmin / dbadmin requests and other hack attempts for tools that we don’t use here. All the index.php requests are worthless / harmless. In the 404 list were lots of random strings requested at the top level of the blog. To me, these strings look like passwords from some password management tool. Hummm.
Readers Ask About ... VPNs
Below is the 6th of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinions. ;)
Previous articles:
Part 1 – LVM+JFS+RAID | Part 2 – Service Virtualization |
Part 3 – Virtualizing Media Storage | Part 4 – Hosting Email |
Part 5 – Reverse Proxies
Laurens Duijvesteijn asks:
Q6: It seems desirable to be able to VPN in to my network at any time, if I decide to set up said service, does any device in my internal network need to connect before it is discoverable?
Sorry, but I don’t understand the question entirely. Discoverable? That confuses me. This isn’t a game console. Your VPN client and server will need to know about each other explicitly. Not to worry, that isn’t very difficult to setup. There are just a few details.
Readers Ask About ... Reverse Proxy Servers
Below is the 5th of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinions. ;)
Previous articles:
Part 1 – LVM+JFS+RAID | Part 2 – Service Virtualization |
Part 3 – Virtualizing Media Storage | Part 4 – Hosting Email
Laurens Duijvesteijn asks:
Q5: Do I need a reverse proxy if I ? I’ve read about proxy servers on TheFu’s blog that filter internal traffic (if you read this, in the end I liked the idea a lot more than at first). Is this even the same thing? If this is to happen, is it correct that I’d need two NICs and bridge the connection from the router to the internal network? If so, can I get rid of the router? We do use it for telephone access too.
Readers Ask About ... Virtualization of Services 1
Below is the 2nd of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinion. ;)
Part 1 – LVM+JFS+RAID | Part 2 – Service Virtualization | Part 3 – Virtualizing Media Storage | Part 4 – Hosting Email
Laurens Duijvesteijn asks:
Q2: I read everywhere about Virtualisation, should I directly install packages to the base system to provide services, or should I virtualise all services? What are the advantages here?
Advantages of Virtualization
The list of advantages is long, but with those advantages comes a few disadvantages. I cannot hope to point out all the advantages, so I’ll limit it to just the main ones.