Originally Posted by spyderx
Remember this is in RAID 5. I am sure striping for standard drive speed would be faster. Raid 5 is always slower due to parity calculation.
This isn't really true. In principle there is no reason why RAID-5 cannot be almost as fast as RAID-0. Specifically for a 4-drive array, RAID-5 could be as fast as 3/4 of the speed of RAID-0 on both reading and writing *IF* you are doing large reads or writes.
The problem is NOT the parity calculation. That's an ultra-cheap CPU operation, just an XOR of a couple of buffers.
The problem is the extra disk I/O often involved. If you seek to a random place on the disk and write a block, the RAID subsystem has to do the following:
Read the original version of the block.
Read the parity block in the same stripe.
XOR the old data block with the new one you're writing.
Take the result of this XOR and XOR it with the old parity block.
Write the result into the parity block.
Write your new data into the data block.
That's two reads and two writes, just to write one block of data.
Another way to do it is to read all of the data blocks in the stripe other than the one you're updating. You also don't have to read the parity block. In a 4-drive RAID-5, that would be two block reads.
XOR those two blocks with your new data block to compute the new parity.
Write your new data and the new parity.
For a 4-drive RAID-5 that would also be two reads and two writes. Obviously on wider arrays this method becomes more expensive while the first stays the same. The advantage of the second method is that incorrect parity blocks are automatically fixed while the first one leaves it incorrect.
But when you're writing large files in large blocks, it needn't be nearly this bad. If the RAID subsystem can recognize that you're writing a lot of contiguous data, it can delay the generation of the new parity block until you've written all of the data blocks in the stripe. Then it computes the new parity block from RAM copies of the new data and queues that for write. Here you are not doing ANY reads from the array, and only one more write than the blocks you're already writing. That's how you get 3/4 of the write speed of a RAID-0 on a 4-drive RAID-5 array. It's just like RAID-0 except that every 4th block is skipped over (because it's used for parity).
Of course this only works if the file system allocates a contiguous stretch of disk blocks to your big file writes. Otherwise, if the file system fragments your new file, then the RAID subsystem will see a bunch of smaller writes in different places on the disk and it'll be unable to apply this optimization.
What I think we all really need is Sun's new ZFS. It eliminates the wall between RAID and the file system. The file system itself knows about physical drives and it manages the writing of extra copies of data blocks to different drives for redundancy. It's obvious that when the file system has all this information in one place it can do a much better job than a file system that doesn't know it's running on a virtual "perfect" disk.