Ooops 1

Posted by JD 05/10/2010 at 08:21

What’s that saying? Some days you eat the bear and other days the bear eats you.
Or perhaps Stupid is as stupid does fits.

Ooops is something you never want to hear your system/network admin say. I’ve heard it said elsewhere and then watched as 200+ NASA servers all started rebooting. No time to save your work. No time to do anything before the screen flickers and a bios screen is displayed.

Today, I said, “ooops.”

The Plan

I had a plan to upgrade my main xubuntu server desktop from 9.10 to 10.04 this morning. Really I was planning to do it over the weekend, but just didn’t get around to it. It isn’t considered a production system, so doing it on a weekday should have been fine.

Of course, I know to make a complete backup before starting something like that. I have internal drives that aren’t mounted just for that purpose. I took an old rsync-backup script and modified it to backup everything except runtime directories and specific file systems. It would mount and umount the backup areas too. There was going to be /backup/slash (for things mounted in root) and /backup/export (for things under the /export file system). Life would be good, I’d checked the targets and sources. Everything seemed fine, so the command was entered:

sudo ./rsync_local_backup.sh

The “Actual” Happened

Ok, so as the script was running for about 20 seconds, I saw some expected stuff happen. Then the list of files to be removed started flying passed on the screen. It kept going, so I cd’d to the target area from a different xterm and started looking for the directories to disappear – the —delete option, which I’d used to be certain the source and target didn’t just add files, was used. The deleted files kept going … on and on. I got scared and used cntl-c, about 15 times before it stopped.

ls returned command not found. Ouch. cd / worked. Good. /bin/ls – command not found. Ouch. echo * and a list of files was shown – only the files/directories from my exclude list were left. This is bad. I try to look at the script to see what really has happened – command not found followed by file not found after switching to an internal bash command (more and less are gone after all).

It appears I’ve deleted everything in /usr /bin /boot /lib ….. gone. It’s all gone. I don’t really know what is left at this point. I’m afraid to look. I’m afraid to reboot, since that will likely cause a non-working system. I doubt it will boot. OTOH, the system isn’t working at all now either, ls is gone.

The Expected Fix

The backup drive contains the last version of the OS that I used to run, xubuntu 8.04. It also contains my internal web site which is used to monitor all the other servers, manage stock market stuff and an internal photo gallery with 9,000+ photos.

The monitoring stuff is important, but I’d updated it a little on the newer install, so that would need to be recreated. I hadn’t updated photo gallery much at all this year, no new photos had been placed there that I can recall, but I had spent a bunch of time researching my father’s medals after seeing them in a photo. You see them every day for 18+ years as he heads off to work, but you don’t really understand what they are. I knew he’d earned a few important medals like the Bronze Star and Legion of Merit, but I didn’t realize the significance of them before doing this research. Lot’s of them were for just showing up or being around a long time. I hope that data isn’t deleted, it will be easier to do again, but I’d rather not.

The fix should be to

  1. shutdown
  2. swap SATA cables
  3. reboot to the old system
  4. look around a little, mount the newer drives someplace safe and look around
  5. figure out what is gone
  6. manually mirror over the missing things from the old system
  7. swap the SATA cables back
  8. reboot into the new/old 8.04
  9. perform an upgrade to 10.04 LTS (might need to do 8.10, 9.04, and 9.10 along the way)
  10. figure out what is conflicting – remove that stuff (DHCP DNS update)
  11. start trying to figure out what is missing – recreate the missing stuff
  12. mirror the new install to a backup area
  13. setup a weekly backup script that works using rdiff-backup (which I’m a huge believer in)

Since this system also contains backups for all the other systems (14+ of them), do I backup the backup areas too? Having an rsync is definitely needed, but having an rdiff-backup of an rdiff-backup isn’t very smart. I’ll need to think about that.

I left out something important. My neighbor has decided to remove his in-ground pool. The jack hammer and cement saw work started last week. With the jack hammer going, I can’t bare to leave the window open. What’s the big deal? I leave the server room window open to help with room cooling. It has cooled the room to the lower 60s nighly, but it still gets to the low 80s in the room during the day even with the house A/C running. The rest of the house is very comfortable and I increased cooling and return vents in the server room last year to get better temperatures. That didn’t work so well.

Anyway, it should be a fun day.

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=628

  1. JD 05/10/2010 at 15:16

    Well, that’s been fun.

    The idea to mirror an old backup didn’t work. I ended up installing 10.04 x64 from scratch with all that entails.

    It took until just a few minutes ago for me to recall/discover the machine was still using DHCP and not a static IP it normally has. This is a problem for any externally facing services – like a web server. While this machine isn’t central to the network, most of those services have been moved elsewhere, it is still an internal backup, apache, samba, print, ntp server with other systems expecting it to be available.

    Anyway, I think most of the critical things are back and working. I just have the dual monitor thing to deal with and I need to plug in the external disk array.

    While I was reinstalling, I decided to dump xfce4 and try LXDE, which is supposed to be a lighter weight desktop environment. So far, it feels good and quick. I can’t wait for the dual monitor stuff to work. Time to load the proprietary nvidia drivers.