December OpenSolaris Meetup

Posted by JD 12/09/2009 at 07:46

I attended the Atlanta area OpenSolaris Meetup last night even though we were getting some major rain in the area which made the 30 minute drive challenging. Why would I bother? Swag? Scott D presenting? Being around other nerds that like Solaris? No, although those are all valid reasons too.

Even with the nasty weather, the room was packed and we had to bring in some more chairs so everyone could sit. About 20 people attended.

New stuff in ZFS

Yep, the entire meeting was about fairly new features added to ZFS on OpenSolaris. Things like data deduplication and how well it works in normal and extreme situations. The main things I took away from the talk were:

  1. ZFS is stable
  2. Data Deduplication, dedup for short, should only be used on backup areas, not on live production data, until you become comfortable with it and the performance in your environment
  3. Dedup happens at the block level of a zpool, anything above that level still works as designed
  4. Only use builds after 129 of OpenSolaris if you plan to use dedup. Earlier versions had data loss issues with the code.
  5. Solaris doesn’t have the dedup code yet. It is not currently scheduled for any specific release either.
  6. DeDup is only available in real-time now, there is no dedup thread that can be scheduled to run later. This could have unknown performance impacts (good or bad).
  7. ZFS supports both read and write cache devices. This means we can specify cheap and expensive SSD memory be used for caching either cache and deploy cheaper, larger SATA disks for the actual disk storage. Some cost/performance examples were shown with 10,000rpm SAS drives compared to SSD cache with 4200 SATA drives. The price was about the same, 4x more storage was available and performance was 2x better for read and about the same for write. Nice.
  8. ZFS has added a way to check for disk size changes – suppose your storage is external to the server and really just a logical allocation. On the storage server, you can expand the LUN that the server sees. ZFS can be configured to manually or automatically refresh disk device sizes.
  9. Device removal – currently there is no direct method to remove the disk from a ZFS pool. There are work arounds, however. Anyway, they are planning to release the method this year in OpenSolaris ZFS to remove a disk from a zpool.

To really get the demo, you need to accept the other great things about ZFS as a basis, then add the new capabilities on top. One of the demonstrations was how IT shops can charge back for data storage to multiple users since they are using the data, even when 20 other departments are also using the same data blocks. Basically, dedup gives you more disk storage without buying more disk.

ACLs are managed at the file system level, not the disk block level, so the dedup’ed data still can only be accessed appropriately.

Why OpenSolaris ?

Is an open source version of Sun Microsystems Solaris operating systems that runs on lots of hardware you may already own. It also runs inside most virtual machines as a client or guest. Since it looks and feels like Solaris, you can become familiar with it for zero cost on your PC at home for just the cost of disk storage – about 20GB. Sun also uses OpenSolaris to trial new features prior to placing them into the real Solaris releases. I run OpenSolaris in a virtual machine under Widnows7 using the free version of Sun’s VirtualBox hypervisor. I know others who run it directly on hardware, under Xen and under VMware hypervisors too. Just give it enough virtual disk storage and go. I think 10GB is enough to load it, but a little more, say 20GB, will let you play with it and applications more.

If you are in the market for NetApp storage, you really need to take a look at Sun’s storage servers running ZFS. The entry price is significantly less and you get all the flexibility of Solaris without giving up CIFS, iSCSI, NFS, and, in the future, fibre channel storage. Good sales job Sun.

Swag

No meetup is a success without some swag. Water bottles, t-shirts, hats, and books, were all available. We were encouraged to take some after the iPod Nano raffle was won (not by me). Pizza and sodas were also provided by the sponsors.

Pondering ZFS

Posted by JD 07/25/2009 at 15:42

As I ponder how to build a redundant file server that serves Linux, Solaris, VMware, Xen, VirtualBox, FreeBSD, FreeNAS, TiVo and Windows systems, a few interesting articles have come to light.

Requirements

Basically, I’d like

  1. reasonable amounts of redundancy
  2. hardware agnostic
  3. FOSS (non-commercial)
  4. Enterprise ready – support for iSCSI, CIFS, Samba, NFSv4, RAID levels, snapshots, and versioning
  5. remote backup capabilities – rdiff-backup would be ideal
  6. Offsite backup capabilities – any type of external storage “in the cloud”
  7. Encryption of offsite backups
  8. high performance capabilities
  9. Suitable for file system, database and raw disk device access

More on this as I work through the solution over the next few days and weeks.

BTRFS

Of course, I came across this article on btrfs a few days later explaining the it will likely be the default Linux file system in a few years. It also explains that any file systems created prior to kernel 2.6.30 are incompatible and with later kernels. Today, I’m running 2.6.24-24-generic SMP. No go.

rdiff-backup Woes

Posted by JD 07/10/2009 at 07:58

rdiff-backup rocks, mostly. But there are times when it doesn’t work as expected or doesn’t work at all. Usually, the not working at all part is a cockpit error, but sometimes not.

Key rdiff-backup features

  1. Simple 1 line backup command; rdiff-backup source target
  2. Reverse Incremental backup sets
  3. Extremely FAST backups. Entire server installations are just a few minutes, after the initial backup set is created.
  4. Last backup set is available as a complete copy of the files. Need to recover? Just copy the file(s) back.
  5. Control over how old backup sets can be. Deletion of “older than x days” sets is trivial.
  6. Compressed older differential backup sets
  7. Current backup is 1-for-1 sized. Older backups are tiny. As an example, a 5GB backup with hundreds of files with 30 days of incremental backups only takes 6GB total. Each daily backup is relatively tiny and based on changes made that day. Usually those changes are just a 10-40MB. Impressive.
  8. Recovery by date/time
  9. FOSS – we like Free and Open Source Software
  10. Cross platform – Unix, Linux, MacOS and MS-Windows.

Things that just work

  1. Linux local rdiff-backup runs, just work. Backing up a directory structure or an entire VM (not as a single huge file) to another mounted disk works very nicely with all the key features listed above.
  2. Win32 local rdiff-backup runs, provided there isn’t any networking involved nor huge files.
  3. Recovery of an entire VM fileset. I’ve needed a few recoveries the last 6 months due to user error. They worked flawlessly and only took 20 minutes from problem discovery to full recovery. That was manual recovery. If this were scripted, it could be less than 5 minutes.

Things that don’t work or work poorly

This is mostly on MS-Windows platforms, but some Linux stuff doesn’t work nicely either. Windows howto that wasn’t really much use.

  1. remote transmission over ssh on a non-standard port is broken regardless of platform. That doesn’t mean it can’t work, but I’ve never been successful in getting it to work. Neither push, nor pull command versions have worked. It shouldn’t be this hard.
  2. Large file differencing doesn’t seem to work on Linux or Windows. In theory, that means 4GB files, but smaller files get confused and end up as a complete copy too, not a block level differential copy.
  3. MS-Windows network backups don’t really work, even over samba connections. Ok, there are many strange things about rdiff-backup on Windows. For example:
    1. You have to `cd` to the drive and directory of the source if you want it to work.
    2. You have to use ‘/’ instead of ‘\’ characters, most of the time. This is a python thing, I guess.
    3. Backups to samba shares may or may not work. I haven’t figured the reason why or why not yet.
    4. Backups over ssh require less than trivial setup. Only push will work from windows unless an ssh server is setup. Then the complexity is exponentially more difficult.
    5. Many people use cygwin with all those faults (slow, heavy, bad directory access) to get around the win32/64 API issues.

So, rdiff-backup is great for local Linux system backups, but for remote backups, you’ll want to use different technology, like rsync. If you’re on Windows and want remote backups, check out some other solutions.

Good writeup on rdiff-backup features, method, and algorithm.

S1/Disk1 -rdff-backup→ S1/Disk2 -rsync/ZFS send→ R2/Disk1

If you have a lead or solution for my woes, please let me know! I often miss trivial solutions.

Here’s an actual rdiff-backup set to clarify:


Time Size Cumulative size
-—————————————————————————————————————
Fri Jul 10 01:32:13 2009 4.37 GB 4.37 GB (current mirror)
Thu Jul 9 01:32:13 2009 36.3 MB 4.41 GB
Wed Jul 8 01:32:13 2009 37.5 MB 4.45 GB
Tue Jul 7 01:32:14 2009 31.4 MB 4.48 GB
Mon Jul 6 01:32:13 2009 31.1 MB 4.51 GB
Sun Jul 5 01:32:13 2009 27.0 MB 4.53 GB
Sat Jul 4 01:32:14 2009 41.3 MB 4.57 GB
Fri Jul 3 01:32:12 2009 33.9 MB 4.61 GB
Thu Jul 2 01:32:13 2009 37.9 MB 4.64 GB
Wed Jul 1 01:32:14 2009 35.4 MB 4.68 GB
Tue Jun 30 01:32:13 2009 37.3 MB 4.71 GB
Mon Jun 29 01:32:14 2009 38.9 MB 4.75 GB
Sun Jun 28 01:32:13 2009 38.7 MB 4.79 GB
Sat Jun 27 01:32:15 2009 42.0 MB 4.83 GB
Fri Jun 26 01:32:13 2009 49.3 MB 4.88 GB
Thu Jun 25 01:32:13 2009 37.3 MB 4.92 GB
Wed Jun 24 01:32:14 2009 36.4 MB 4.95 GB
Tue Jun 23 01:32:13 2009 43.0 MB 4.99 GB
Mon Jun 22 01:32:15 2009 33.4 MB 5.03 GB
Sun Jun 21 01:32:15 2009 31.0 MB 5.06 GB
Sat Jun 20 01:32:13 2009 41.6 MB 5.10 GB
Fri Jun 19 01:32:14 2009 31.7 MB 5.13 GB
Thu Jun 18 01:32:14 2009 32.0 MB 5.16 GB
Wed Jun 17 01:32:15 2009 31.0 MB 5.19 GB
Tue Jun 16 01:32:17 2009 31.6 MB 5.22 GB
Mon Jun 15 01:32:16 2009 31.7 MB 5.25 GB
Sun Jun 14 01:32:14 2009 31.3 MB 5.28 GB
Sat Jun 13 01:32:14 2009 30.7 MB 5.31 GB
Fri Jun 12 01:32:14 2009 31.3 MB 5.34 GB
Thu Jun 11 01:32:15 2009 32.3 MB 5.37 GB