rdiff-backup isn't Perfect

Posted by JD 10/02/2010 at 09:58

I like rdiff-backup to backup your HOME directories and Virtual Machines efficiently. Ok, that is a little understated, I LOVE rdiff-backup.

So, every 6 months or so, when it lets me down in some way, I have to recall all the good things that is actually does solve. Things like:

  • Efficient backup times; just a few minutes to backup entire virtual machines
  • Efficient backup storage; about 10% more storage for 30 days of backups than a mirror (rsync) uses.
  • Easy recovery for 1 file from the latest backup (really it is trivial, just a file copy)
  • Easy recovery of a complete backup set from X days ago (I’ve tested this more than I like)
  • Easy to get information about backup increments and sizes for each.
  • FLOSS software from GNU (not commercial)
  • Backup failures get rolled back – you always get a complete, clean set.
  • No screwing around with SQL databases or other less than easy to understand crap.

KVM Virtualization on Ubuntu 9.10 Server 5

Posted by JD 03/10/2010 at 15:43

The last few days, I’ve been playing with Ubuntu Server 9.10. It hasn’t been perfect. There have been problems along the way. So everyone else knows the issues, I’ll list a few here with a little detail.

It all started during the Server 9.10 x64 installation.

Be certain to check out the comments for solutions to issues as I discover them.

Cold Backup for Alfresco

Posted by JD 12/13/2009 at 20:16

The script below was created as part of an Alfresco upgrade process and meant to be run manually. This is fairly trivial cold backup script for Alfresco 2.9b, which is a dead release tree from our friends at Alfresco. It hasn’t been tested with any other version and only backs up locally, but could easily backup remote with sshfs or nfs mounts or even rdiff-backup commands swapped in.

For nightly backup of our production servers, we actually perform rdiff-backups of shutdown virtual machines, which take about 3 minutes each. That little amount of downtime to have a differential backup of the entire VM is worth it to us.

#!/bin/sh
# ###############################################################
# This script should not be run from cron. It will wait for the mysql
# DB password to be entered.
# 
#  Created by JDPFU 10/2009
# 
# ###############################################################
# Alfresco Backup Script - tested with Alfresco v2.9b
#   Gets the following files
#    - alf_data/
#    - Alfresco MySQL DB
#    - Alf - Extensions
#    - Alf - Global Settings
# ###############################################################
export TOP_DIR=/opt/Alfresco2.9b
DB_NAME=alfresco_2010_8392
export EXT_DIR=$TOP_DIR/tomcat/shared/classes/alfresco/extension
export BACK_DIR=/backup/ALFRESCO
export BACKX_DIR=$BACK_DIR/extension

# Shutdown Alfresco
/etc/init.d/alfresco.sh stop

# Backup the DB and important files.
# dir.root setting will change in the next version
/usr/bin/mkdir  -p $BACK_DIR
cd  $BACK_DIR/; 
/usr/bin/rsync  -vv -u -a --delete --recursive --stats --progress $TOP_DIR/alf_data $BACK_DIR/

echo "
  Reading root MySQL password from file
"
/usr/bin/mysqldump -u root \
    -p`cat ~root/bin/$DB_NAME.passwd.root` $DB_NAME | \
    /bin/gzip > $BACK_DIR/${DB_NAME}_`date +%Y%m%d`.gz
/usr/bin/find  $BACK_DIR -type f -name "$DB_NAME"/* -atime 60 -delete

/usr/bin/cp  $TOP_DIR/*sh $BACK_DIR
/usr/bin/mkdir  -p $BACKX_DIR
/usr/bin/rsync  -vv -u -a --delete --recursive --stats --progress  $EXT_DIR/* $BACKX_DIR/

# Start Alfresco
/etc/init.d/alfresco.sh start

Why a cold backup? Unless you have a really large DB, being down a few minutes isn’t really a big deal. If you can’t afford to be down, you would already be mirroring databases and automatically fail over anyway. Right?

We use a few extensions for Alfresco, that’s why we bother with the extensions/ directory.
There are many ways to make this script better. It was meant as a trivial example or starting point to show simple scripting methods while still being useful.

Alfresco Atlanta Meetup

Posted by JD 10/28/2009 at 19:45

On Tuesday and Wednesday this week, there were a few Alfresco Meetups in Atlanta that I attended.

Tuesday was just a few hours to begin the organization of the informal group. Wednesday was an all day event with sponsors, presentations, and vendors. For what each of these were, they were well organized and cut to the core for experienced Alfresco users and developers.

My main takeaways were:

  1. There is no upgrade path from v2.9b —> v3.×. v2.9x was a dead development tree.
  2. If you aren’t a paid, enterprise customer and elect to use the 1 or 2 suggested community edition releases, you are on your own. Sometimes the company chooses to drop community releases. When I asked for suggestions to ensure we weren’t caught again with no upgrade path, there was no answer, just silence.
  3. Alfresco is a Java Application running on Tomcat (by default). It is just a normal Tomcat app, so if you want to customize it, you’ll be best served by Java development. Some fairly trivial view modifications may be possible with view changes using the template engine that Alfresco uses. However, I’d never heard of this markup – must be a java thing.
  4. Alfresco is an impressive OSS product that competes with many commercial applications that charge $50K – $1.5M for deployment licenses, They make money by selling enterprise licenses and providing support contracts. Deployments are usually performed (98% of the time) by VARs. This means they need to concentrate on supporting paid customers and may trial different techniques on the Community Edition. Sometimes it isn’t very stable and sometimes core functions are broken in the community edition.
  5. Most of the attendees were using the enterprise version or were VARs who, by contract, were only allowed to deploy the enterprise version. If you are an Alfresco Partner, I understand you cannot support the community edition for your customers.
  6. If you deploy Alfresco, think of it as a content container back end, not a complete solution unless everything you see out of the box is exactly what you want. Almost every user of the tool creates customizations for their environment.
  7. CMIS is an emerging standard for communicating with ECM, DMS, WCM systems. A number of vendors have signed up. Alfresco is saying it is like SQL for content management systems. Both RESTful and WSDL interfaces are provided with this standard and it should allow customized front ends to communicate using a standard language to CMS back ends regardless of vendor. EMC, IBM, Microsoft, SAP, and Alfresco were listed as backers.
  8. The Alfresco folks were really nice, but couldn’t really help me. This community appears to be made up of folks that do ECM for their primary jobs and not just 1/20th of their responsibilities like me.
  9. Alfresco is an extremely capable platform, mainly suitable for normal DMS requirements. Less so for WCM based on the Best Practices session. The BPM parts appear to be very powerful, but only when you customize with Java.

I plan to stay 1 revision behind the currently recommended Alfresco release. So, right now, v3.2r is recommended. That means I’ll be re-deploying v3.1 when I get around to dropping the current install and re-importing.

I was in way over my head with all levels of the conversation. The terms used were Alfresco and java specific, neither of those are my skill set. What I need is a newcomers’ introduction to Alfresco, Best Practices for the FOSS version, and how to determine when it is time to pay for the enterprise supported version.

I wrote this summary quickly as a dump when I got home and didn’t proof it. Some of it could be inaccurate to what actually happened. I am prone to selective memory when I’m frustrated.

Backup Clock Times

Posted by JohnP 09/27/2009 at 13:10

I came across an old article that I wrote on backups that had some clock times for the different VMs. Since that article was written, I’ve changed the backup methodology from rsync to rdiff-backup.


dms44 → 1m:52s Alfresco
crm46 → 3m:36s vTiger
xen41 → 3m:10s Typo
pki42 → 1m:17s
mon45 → 1m:8s
zcs43 → 3m:53s Zimbra

Those are real “downtime” numbers to ensure completely safe backups were made with all files closed. Actually, the virutal machine is shutdown during the backup periods. Email is unavailable for 4 minutes at around 2am daily. We can live with that. Recovery works perfectly too. I’ve recovered the largest VM twice in under 20 minutes after some cockpit errors.

This works because we use Xen virtual machines and rdiff-backup. Most of the VMs are 20GB in disk size, but use less actual storage.