- Basics of vPC : Virtual Port Channel (vPC)- Part 1
- vPC Inconsistency and Control Plane: vPC – Part II
- vPC Failure Scenarios
- vPC with HSRP : vPC with HSRP – Part IV
- vPC Design Variations vPC Design Variations
Till now, We have learned the concepts of vPC and how the vPC works. We also saw the functionality of control plane alongwith some key features.
Today, we are going to discuss various Failure scenarios and how vPC is impacted by them.
- vPC Peer Link Failure
- vPC KeepAlive Link Failure
- Peer and KeepAlive Link Failure
- Primary Peer Switch Failure
- Primary Switch and Keepalive link Failure
- Primary and Secondary both Switch Failure
1. vPC peer link failure:
When vPC encounters a peer link failure, following sequence of events happen:
- Peer Status will be changed to “Peer Link is down” on both the vPC switches.
- As the Peer Keepalive link is up, both the switches will know that their peer is alive.
- So they will retain their vPC role and won’t take on the active role, hence we will not be put into Split Brain/Dual Active Situation.
- Peer link failure means loss to East-West traffic , so to minimize this loss secondary peer suspends its Member ports except Orphan ports.
- This will prevent the duplication of frames and loops in the network.
- Unfortunately, this will blackhole the traffic for orphan ports.
Lets shut down the peer link and see that even though the peer link is down, vPC is still operational.
On Secondary Peer, notice that vPC member ports are down with reason “Peer-link is down”
So the updated topology is as below:
What if a new port is added when peer-link has already failed, will it come up???
So lets revise the order of operations and it does have consistency check as one of the check. The failed consistency check will keep the new ports down as well and they will be brought up, once Peer Link outage is restored.
2. vPC Keepalive Link Failure:
- When vPC keepalive Link Fails and peer link is still up, both the switches are still receiving BPDUs from each other.
- So they will retain their vPC roles and this will not impact the overall functionality of the vPC.
vPC status once the keepalive link is made shut:
3. Peer and Keepalive link failure:
When both the links fail, each switch will assume that peer is dead and take on the operational primary role. Hence we will end up in a Active-Active Scenario. Both the switches will now forward traffic and can form Layer 2 loops.
This sort of situation require a manual intervention for recovery.
Primary acting as Primary:
Secondary also acting as operational Primary:
Though it sounds rare, but poor network design may bring both the Peer and Keepalive links to fail at the same time. This will happen under below circumstances:
- Both the links share the same Module and we have a Module failure on the switch.
- Both the links are on different Modules but are connected to Peer Switch via a common Layer 2 device and we have a failure on that common device.
As a best practice, bundled links and keepalive link should not share the same fate. There should be redundancy in place in case failure occurs.
In case keepalive link is on SVI then “dual-active exclude interface-vlan” command can be used to keep the SVI up in case of link failure.
4. Primary Peer Switch failure
Suppose we had a power Outage and primary switch is powered off. The secondary switch will consider the peer to be down as both the peer and keepalive links went down. Once three keepalives are missed, secondary will take over the role of Primary and start forwarding the traffic.
When the primary switch comes up, it will resume the operational secondary role as the vPC role is non-preemptive. This is because preemption will incur a traffic loss , which is not acceptable.
So if you come across a output where role is “Secondary, operational primary” , this indicates that this is result of past failure.
5. Primary Switch and Peer Link Failure
Think of a situation when first we had a peer link failure , secondary will shut down its member ports. Primary was forwarding the traffic and suddenly it also fails. The secondary will stop getting heartbeats and will suspect that primary has failed. When three keepalives are missed, secondary will unshut the ports and assume the primary role.
As keepalives are sent every second, so there will be traffic disruption of around 4-5 seconds. This can be minimized by setting the keepalive interval to lower value using below command:
6. Power outage on both vPC peer switches
If there is a power outage and both the switches go down, then vPC will be completely down causing a complete outage. Once the power is restored, if only one of the switch comes up then keepalives will not be heard, hence peer link will not come up. This will also not allow the member ports to come up.
So even if one of the switch is restored, we are still experiencing a complete isolation. “Auto recovery” is an option used to overcome this failure situation.
This will allow the switch to assume primary role and start forwarding the traffic in case peer does not come up.
In our coming vPC series, we will discuss about vPC flavors , Data Plane forwarding and HSRP with vPC.