
Here's the follow up to my post Testing N+1 Components, Part One - Power Supplies, relating to testing redundant components and failover mechanisms or processes. Again I will reiterate the mantra that IT equipment is electromechanical, will eventually fail, usually at the worst time and in many cases long before the Mean Time Between Failure (MTBF) published by the manufacturer.![]()
While the IT department spends quite a bit of money and effort to implement component redundancy (N+1 Redundancy being the best), but when's the last time someone actually tested these to make sure they work as expected? I'm not saying that we should run around willy-nilly in the data center pulling cables and hot-plug devices. This should be performed methodically: I would recommend testing with non-production systems first and only once a recent, known-good, backup is readily available.
Redundant Network Connections - Most servers now come equipped with at least two network adapters (NICs) integrated into the motherboard:
- NIC teaming or Link Aggregation supports the "bonding" of more than one NIC to increase link speed or support redundancy.
- If possible, these connections should run to different network switches so that one switch failure won't impact network connectivity.
- Make sure that the teaming type you select is supported by the NIC and switch. It may be necessary reconfigure the switch port to support the chosen type such as Network Fault Tolerance (NFT), Switch-assisted Load Balancing with Fault Tolerance (SLB), Transmit Load Balancing with Fault Tolerance (TLB), 802.3ad Dynamic with Fault Tolerance, etc.
- It may be advisable to purchase another NIC for an expansion slot and "team" that with an integrated NIC to distribute the risk of failure.
- Disconnect a NIC and confirm that network connectivity is maintained. If not, find out why to ensure a NIC or switch failure or cut/crimped network cable doesn't cause unnecessary downtime.
I'll discuss Redundant Array of Independent Disks (RAID) and Hot Plug Drives in Part Three.






» Testing N+1 Components, Part Three - Hard Drives from ITechTips
Here's the follow up to my post Testing N+1 Components, Part Two - Network Connections, relating to testing redundant components and although I had planned discuss failover mechanisms/processes like Clustering too, I've given further thought an... [Read More]
Tracked on: October 21, 2007 8:45 PM | Permalink to Trackback