
Here's the follow up to my post Testing N+1 Components, Part Two - Network Connections, relating to testing redundant components and although I had planned discuss failover mechanisms/processes like Clustering too, I've given further thought and think I'll make that a separate subject rather than part of the series because of the complexity.![]()
As I've stated previously, IT equipment is electromechanical, will eventually fail and this is especially true of hard drives that continually spin at thousands of Revolutions Per Minute (RPM). I doubt there is any other component that sees this level of wear, which is why redundancy is so important.
While the IT department spends quite a bit of money and effort to implement component redundancy (N+1 Redundancy being the best), but when's the last time someone actually tested these to make sure they work as expected? Because the chance of data loss is very real, this is one component I won't recommend testing and the chances are you've probably experienced a drive failure and if you've implemented RAID with Hot Plug drives you've already seen the swap process in action, so I'll just discuss some of the finer points.
Redundant Array of Independent Disks (RAID) and Hot Plug Drives - Most servers are now equipped with bays to accommodate multiple hard drives and when a drive fails it can be removed & replaced without taking the server offline (Hot Plug):
- Software-based RAID may be implemented via the operating system, but be prepared to commit some CPU resources to maintain the RAID and recovery from a drive failure may become a bit more complicated.
- Hardware-based RAID means a storage controller capable of supporting RAID has been installed in an expansion slot or is integrated into the motherboard. Verify that the controller can support the RAID type you intend to implement (RAID1 - Mirror, RAID5 - Striping with Parity, etc.).
- Integrated RAID controllers often don't have any Write Cache memory installed, only Read Cache. If the intended RAID array will be experiencing a great deal of Write operations, consider purchasing a Write Cache module (if available) or a separate expansion slot RAID controller with Write Cache.
- With the exception of RAID0 - Striping, some capacity of the entire array will be lost to support the RAID type. As an example; two 300GB hard drives in a RAID1 - Mirror array will result in less than 300GB usable storage.
- It is possible to run multiple RAID types on a single server, so select the type that will best compliment how data is being read from or written to the array. RAID5 is not a good selection for write intensive applications because it must "pause" to calculate parity before writing data to the array. This pause will impact performance negatively.
- Configure a "Global Hot Spare" hard drive on the server unless you have some readily available in a spare pool. With the exception of RAID0 - Striping, other RAID types can sustain the failure of single hard drive with no data loss.
I could probably go into more detail, but it would be overkill. If you want to learn more about RAID, here's a good tutorial.






Comment Preview