Virtualization Disk Performance

Posted by JD 09/06/2008 at 15:19

I’ve been running Xen and VirtualBox VMs here. We’ve been experiencing VM performance issues on 2 VMs under Xen on a fairly powerful box. Basically, the DomUs get really slow – slow.

Here’s what I know.

  1. while connected via ssh to a DomU, sometimes my shell becomes non-responsive for 30+ seconds.
  2. A php network monitor too shows the listeners aren’t answering in a timely fashion

Suspects:

  1. sparse disk files for the images this is the issue – COMFIRMED
    1. I used sparse files for the OS images. This is the default instead of preallocated or LVM based disks. I need to migrate from sparse to full ASAP! `hdparm -t` shows Dom0 with 60-80 MBps throughput. On DomU it is 55-65MBps which should be good enough.
  2. virtual network issues
    1. In older network bridges under Xen, there were real problems. The IO reported by others in tests were 3-4 Mbps over GigE networks. By using a virtio driver instead, they got 750+Mbps throughput. That’s what I’m talkin’ bout!
    2. There could be issues with my cheap switch and cheap router too. IP sessions may not expire quickly enough.
    3. I didn’t change any of the default network settings from the xen install. Perhaps this was a bad idea?
  3. poor kernel settings for Dom0 and DomUs
    1. I’ve check the scheduler for issues – the defaults in Hardy and Xen 3.2 seem to follow the suggestions I’ve found thru google.
  4. Similarly mis-configured DomU OS installations
    1. Well, it is believable that I made some other bad choices related to OS installation. I don’t know what they could be.

I have some work to do. Check back here for the results.

Results

  1. I happened to be watching disk I/O and CPU when another lockup/slowdown happened. loop0 was the top process at the time. That means – disk is the issue. Nice.
  2. Ok, after huge amounts of research, I was finally able to mount the sparse and full disk file images. Then I attempted to dump/restore the old system onto the new files (full/non-sparse). Done. Then I took a huge step and started the VM up – the new VM. Everything worked. FAST and with only 128M of RAM instead of the normal 512M that the old image had. It really is amazing … so far.
  3. So assuming that actually worked perfectly – 10 minutes of use only – now it is time to do the same thing for my Zimbra installation. I am feeling lucky, punk. Get ‘er done. I’ll report back here in an hour or so.
  4. So I swapped the Zimbra installation from a sparse file to a FULL file system. I’ll watch the monitoring software to see whether the 30+ reported outages per day keep happening. My blog installation definitely is showing higher responsiveness, definitely. I haven’t seen the slowdowns on a console, but I haven’t sat behind the server for hours yet either. We shall see.
Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=253