All About Uptime
All this modularity should result in greater uptime and resiliency -- something most network managers prize even above high performance. Accordingly, in our tests we gave the greatest weight to assessments of high availability and resiliency.
We reviewed high availability with two tests of software and another involving hardware. The first software test focused on the Nexus switch's process restart capability. We configured the Spirent TestCenter traffic generator/analyzer to bring up Open Shortest Path First (OSPF) adjacencies on all 256 Nexus 10G Ethernet ports, advertise routes to more than 50,000 networks and offer traffic to all networks.
While traffic was flowing, we deliberately killed the Nexus' OSPF process and then watched as the switch automatically restarted the process. Not a single packet was lost, and no change was visible to the hundreds of other OSPF routers emulated by Spirent TestCenter.
This is a different mechanism than OSPF graceful restart, where routes must be recalculated. Process restart occurs much faster (typically in less than a second) so that no change in routing topology is visible to other routers.
Our second set of software resiliency tests involved upgrading and then downgrading system software while continuously forwarding traffic, a key capability in situations where no downtime is acceptable. In both upgrade and downgrade tests, we changed the software image on the first management card, watched as it handed over responsibilities to a second management card and then upgraded all line cards. A complete upgrade took nearly 45 minutes, during which the Nexus maintained all routing table entries and forwarded all traffic with no packet loss.
It's just as important to support seamless downgrades as upgrades. Indeed, prior experience with many vendors' routers and switches suggests the downgrade path is a lot bumpier than the upgrade one. That was not a concern with the Nexus switch; as in the previous tests, we saw no changes in routing and no packet loss during a downgrade.
Cisco claims Nexus offers N+1 redundancy with as few as two fabric cards in place for gigabit line cards or as few as three cards in place for 10G Ethernet cards. To validate those claims, our final resiliency test involved pulling four out of Nexus' five fabric cards one by one while continuing to offer traffic to all 256 10G Ethernet ports.
Fabric utilization rose as we removed the cards, but there was no packet loss with just two out of five fabric cards left. With only one fabric card in place, the system dropped about 47% of traffic but that's because our traffic load oversubscribed the fabric. These results validate Cisco's redundancy claims; in addition, the single-fabric result became very significant in our performance tests.