HDD Performance Tuning and Results 2

Posted by JD 10/17/2011 at 11:01

In my research for the last article on USB storage and KVM clients, I came across a few articles that should improve disk performance.

I also decided to tweak some HDD performance parameters. The commands used and results are below. If you have a disk array, you probably want to tune the parameters to increase read performance. My RAID setup saw a 41% disk read improvement. That’s pretty impressive for 20 minutes of effort. If you just have single disks, the default settings seemed fine and didn’t make much difference on my machines, but for many others there are big differences. Performance tests for SATA, PATA, Laptops, USB and RAID disks were performed.

Below are links to those articles, my performance tests and scripts for different systems, and my final tuning decisions based on those tests.

Let the Tuning Begin

www.robthomson.me.uk/2011/03/30/tuning-kvm-guests-you-need-to-do-it/ for blockdev settings on the server-side and NFS client network settings. It seems that the following setting is recommended for disks as a starting point. Note that all the dashes are doubled for the setra parameter below. Unfortunately, copy/paste doesn’t work to get the corrected double-dash either.

$ sudo /sbin/blockdev —setra 8196 /dev/sda

Redhat recommends increasing it in 4M increments until you find the fastest result up to 32M in size. Testing of disk throughput is performed using hdparm.
$ sudo hdparm -tT /dev/sda

This needs to happen on every disk on the machine – each may produce different results based on the controller, backplane, physical disk capabilities, connection type, etc. These read-ahead settings also work inside virtual machines, so
$ sudo /sbin/blockdev –setra 8196 /dev/vda
makes sense for virtio HDDs inside a VM too.

FYI, the HDD were set to 256 by default, not 8196. According to Rob, he saw his performance go from 40MBps to 157MBps with this change alone.I decided to test on 3 different physical machines here. I found these results.

Laptop VM Tests
  • 256 (default)
    • Timing cached reads: 4336 MB in 1.99 seconds = 2176.16 MB/sec
    • Timing buffered disk reads: 230 MB in 3.01 seconds = 76.42 MB/sec
  • 4096
    • Timing cached reads: 4570 MB in 1.99 seconds = 2294.05 MB/sec
    • Timing buffered disk reads: 246 MB in 3.00 seconds = 81.98 MB/sec
  • 8196
    • Timing cached reads: 4598 MB in 1.99 seconds = 2308.17 MB/sec
    • Timing buffered disk reads: 246 MB in 3.02 seconds = 81.52 MB/sec
  • 12292
    • Timing cached reads: 4418 MB in 1.99 seconds = 2217.73 MB/sec
    • Timing buffered disk reads: 248 MB in 3.01 seconds = 82.41 MB/sec

This VM was on a laptop with a 7200rpm sata hdd, so I didn’t expect miracles.

On a Xen server, sda
  • 256 (default)
    • Timing cached reads: 12034 MB in 1.99 seconds = 6039.12 MB/sec
    • Timing buffered disk reads: 246 MB in 3.01 seconds = 81.72 MB/sec
  • 8196
    • Timing cached reads: 10456 MB in 1.99 seconds = 5244.44 MB/sec
    • Timing buffered disk reads: 252 MB in 3.02 seconds = 83.48 MB/sec
  • 12292
    • Timing cached reads: 9844 MB in 1.99 seconds = 4936.27 MB/sec
    • Timing buffered disk reads: 212 MB in 3.07 seconds = 69.16 MB/sec
On a Xen server, sdb
  • 256 (default)
    • Timing cached reads: 12780 MB in 1.99 seconds = 6414.97 MB/sec
    • Timing buffered disk reads: 276 MB in 3.12 seconds = 88.37 MB/sec
  • 4096
    • Timing cached reads: 13640 MB in 1.99 seconds = 6847.69 MB/sec
    • Timing buffered disk reads: 222 MB in 3.01 seconds = 73.81 MB/sec
  • 8196
    • Timing cached reads: 13286 MB in 1.99 seconds = 6669.76 MB/sec
    • Timing buffered disk reads: 236 MB in 3.01 seconds = 78.49 MB/sec
  • 12292
    • Timing cached reads: 12194 MB in 1.99 seconds = 6120.66 MB/sec
    • Timing buffered disk reads: 240 MB in 3.01 seconds = 79.82 MB/sec

See that the sda and sdb results were different even in the same physical machine? Interesting.

External USB2 HDD, Notice how much slower USB2 is?
  • 256 (default)
    • Timing cached reads: 17282 MB in 2.00 seconds = 8648.87 MB/sec
    • Timing buffered disk reads: 88 MB in 3.04 seconds = 28.91 MB/sec
  • 4096
    • Timing cached reads: 16150 MB in 2.00 seconds = 8082.61 MB/sec
    • Timing buffered disk reads: 100 MB in 3.05 seconds = 32.81 MB/sec
  • 8196
    • Timing cached reads: 16222 MB in 2.00 seconds = 8118.40 MB/sec
    • Timing buffered disk reads: 100 MB in 3.02 seconds = 33.07 MB/sec
  • 12292
    • Timing cached reads: 18124 MB in 2.00 seconds = 9070.54 MB/sec
    • Timing buffered disk reads: 100 MB in 3.05 seconds = 32.83 MB/sec

USB2 is over 50% slower than SATA connections. Hum. We always knew that.

On an external software RAID5 array,
  • 256 ra test for /dev/md0
    • Timing cached reads: 16616 MB in 2.00 seconds = 8317.07 MB/sec
    • Timing buffered disk reads: 432 MB in 3.01 seconds = 143.70 MB/sec
  • 4096
    • Timing cached reads: 17142 MB in 2.00 seconds = 8580.02 MB/sec
    • Timing buffered disk reads: 612 MB in 3.01 seconds = 203.57 MB/sec
  • 8196
    • Timing cached reads: 16934 MB in 2.00 seconds = 8474.61 MB/sec
    • Timing buffered disk reads: 582 MB in 3.02 seconds = 192.98 MB/sec
  • 12292
    • Timing cached reads: 16466 MB in 2.00 seconds = 8240.35 MB/sec
    • Timing buffered disk reads: 600 MB in 3.00 seconds = 199.75 MB/sec

That’s a pretty big difference for reads on the array. For non-RAID disks, the default didn’t seem to bad in my tests.
For me, it is all about the buffered disk reads. The external array is connected over Infiniband

Here’s the simple script that I used:

#!/bin/sh
DEV=/dev/sda
echo "256 ra test for $DEV"
/sbin/blockdev --setra 256 $DEV
hdparm -tT $DEV
/sbin/blockdev --setra 4096 $DEV
hdparm -tT $DEV
/sbin/blockdev --setra 8196 $DEV
hdparm -tT $DEV
/sbin/blockdev --setra 12292 $DEV
hdparm -tT $DEV

There may be a better way to ensure these parms stick between reboots, but I just dropped them into /etc/rc.local as
/sbin/blockdev –-setra 4096 /dev/sd[abcdefg]

Then added another, similar, line for the software array device, md0.

If you do perform these tests and think about it, please post your results in the comments.

  1. JD 10/17/2011 at 12:49

    Tom’s Hardware comparison between esata, USB2, Firewire-400 and Firewire-800.

    Hint: Use eSATA when possible.

    The benchmark results clearly show that if you really want a high-performance external hard drive, then you should absolutely go for eSATA and nothing else. FireWire 800 delivers much higher performance than USB 2.0 or FireWire 400, but it still doesn’t get anywhere near the nice results that eSATA offers.

    We all knew that already, right?
    What everyone should know about Portable HDDs.

    USB3 Device Connected via USB2

    I have a USB3 external HDD, but no USB3 capable systems – it had a great price. I connected it to a server here. Because it was just a USB2 connection, the throughput was limited. The best results were for a read-ahead of 8192

    • Timing cached reads: 13762 MB in 2.00 seconds = 6886.67 MB/sec
    • Timing buffered disk reads: 86 MB in 3.00 seconds = 28.62 MB/sec

    These results are slower than my other USB2 connected device tests. That other device has an eSATA port, so the internal circuitry is designed to handle much higher throughput. The USB3 device is a $14 enclosure – cheap. Sometimes I connect this device to a Windows laptop – the performance feels faster than USB2, but the numbers don’t back that up. Interesting. Guess I’m filled with wishful thoughts. ;)

  2. JD 11/02/2011 at 13:57

    An early 2009 article over at smallnetbuilder.com about build-it-yourself NAS performance has gotten me thinking about doing this myself.

    The last sentence of the article sums it up this way:

    So if you’re looking for a low-cost way to build a dual-drive NAS, you can choose a motherboard using an Intel Atom, VIA C7 or AMD Geode CPU and be pretty certain of getting better than 2X the performance you can get from any (current) off-the-shelf NAS.

    Buying cheap NAS solutions will perform worse, sometimes significantly worse (3x), than building a cheap, low power, system.

    Just to clarify – the stats in the article above do not translate into real-world performance. They are only useful for apples-to-apples comparisons using the same test suite.

    I decided to transfer a 7+GB file between a few systems. Here are the results:

    • Scenario 1 – RAID5 writes over GigE
      • Source is Core2Duo – target is Core i5
      • 7.71MB/sec
    • Scenario 2 – RAID5 reads over GigE
      • Source is Core i5 – target is Core2Duo
      • 47.16MB/sec
    • Scenario 3 – Black-HDD writes over GigE
      • Source is Core2Duo – target is Core i5
      • 41.52MB/sec
    • Scenario 4 – Black-HDD reads over GigE
      • Source is Core i5 – target is Core2Duo
      • 51.48MB/sec

    RAID5 writes are slow, but RAID5 reads are near to a single disk performance.
    Since the test machines are busy doing other work, these can be considered real-world results. If the files are smaller, about 2G or less, then I routinely see 65-75MB/sec throughput over the GigE network, but this is due to OS disk cache which hides the true speed of the spinning platters. Eventually, the data must be written to disk.