Friends, today in this blog I’ll be sharing an approach to troubleshoot one of the most commonly used interior gateway protocol (IGP) protocol OSPF across all domains.
It routes IP packets based solely on the destination IP address and IP Type of Service found in the IP packet header. IP packets are routed “as is” — they are not encapsulated in any further protocol headers as they transit the Autonomous System.
OSPF is a dynamic routing protocol. It quickly detects topological changes in the AS (such as router interface failures) and calculates new loop-free routes after a period of convergence. This period of convergence is short and involves a minimum of routing traffic.
The way in which OSPF routers address OSPF packets varies with the OSPF network type.
- Broadcast NetworksFor broadcast networks, OSPF routers use the following two reserved IP multicast addresses:
- 184.108.40.206 – AllSPFRouters: Used to send OSPF messages to all OSPF routers on the same network. The AllSPFRouters address is used for Hello packets. The DR and BDR use this address to send Link State Update and Link State Acknowledgment packets.
- 2240.0.6 – AllDRouters: Used to send OSPF messages to all OSPF DRs (the DR and the BDR) on the same network. All OSPF routers except the DR use this address when sending Link State Update and Link State Acknowledgment packets to the DR.
- Point-to-Point NetworksPoint-to-Point networks use the AllSPFRouters address (220.127.116.11) for all OSPF messages.
- NBMA NetworksNBMA networks have no multicasting capability. Therefore, the destination IP address of any Hello or Link State packets is the unicast IP address of a specific neighbor. The neighbor IP address is a required part of OSPF configuration for NBMA network links.
router ospf 1
network 0.0.0.0 255.255.255.255 area 0
Let’s discuss detailed approach on OSPF Troubleshooting:
OSPF Neighbor States: Use “Show IP OSPF Neighbor” to check the ospf status,
- Down:No information has been received from anybody on the segment.
- Attempt:On non-broadcast multi-access clouds, this state indicates that no recent information has been received from the neighbor. An effort should be made to contact the neighbor by sending Hello packets at the reduced rate PollInterval.
- Init:The interface has detected a Hello packet coming from a neighbor but bi-directional communication has not yet been established.
- Two-way:There is bi-directional communication with a neighbor. The router has seen itself in the Hello packets coming from a neighbor. At the end of this stage the DR and BDR election would have been done. At the end of the 2way stage, routers will decide whether to proceed in building an adjacency or not. The decision is based on whether one of the routers is a DR or BDR or the link is a point-to-point or a virtual link.
- Exstart:Routers are trying to establish the initial sequence number that is going to be used in the information exchange packets. The sequence number insures that routers always get the most recent information. One router will become the primary and the other will become secondary. The primary router will poll the secondary for information.
- Exchange:Routers will describe their entire link-state database by sending database description packets. At this state, packets could be flooded to other interfaces on the router.
- Loading: At this state, routers are finalizing the information exchange. Routers have built a link-state request list and a link-state retransmission list. Any information that looks incomplete or outdated will be put on the request list. Any update that is sent will be put on the retransmission list until it gets acknowledged.
- Full:At this state, the adjacency is complete. The neighboring routers are fully adjacent. Adjacent routers will have a similar link-state database.
If the OSPF adjacency is not coming up, stuck in transit state when you are executing command <#show ip ospf neighbor> command on cisco IOS cli:
For OSPF to build a neighbor relationship, a few requirements have to be met:
- Routers must be in the same subnet.
- Hello and dead timers must match.
- Router IDs must be unique.
- Routers must be in the same area.
- Stub flag must be identical.
- IP MTU must be identical.
- Must pass neighbor authentication (if configured).
Flapping causes area to recalc SPF. minimize by
- OSPF schedule Delay – 5 seconds after receiving LSU/LSAs
- Hold time – wait minimum of 10 seconds before running another SPF
Common OSPF adjacency fail issues:
- OSPF Neighbor table does not display adjoining router.
- OSPF Neighbor status is stuck in INIT state.
- OSPF Neighbor status is stuck in 2-way state.
- OSPF Neighbor status is stuck in EXSTART/EXCHANGE state.
1. OSPF neighbor table does not display adjoining router :
This is a major problem in OSPF network. If the neighbor table does not display the adjoining router, it means either Hello packets are not being exchanged or it is being blocked or dropped between the two routers. There could be various reasons behind this behavior.
It could be a layer 1 / 2 problem or a configuration mistake.
Some reasons why an OSPF neighbor table does not display the adjoining router as its neighbor are:
- OSPF is not enabled on the router
- OSPF is not enabled on the interface
- OSPF interface is down. Layer 1 /2 problem
- Area ID mismatch between the interfaces
- Subnet mask mismatch between the interfaces
- Hello and Dead timer configured on the routers do not match
- OSPF authentication is enabled on one router and disabled on another
- OSPF authentication-mode configured on both routers do not match
- OSPF authentication-key configured on both routers do not match
- OSPF interface is configured as silent-interface
- ACL is blocking OSPF traffic
- Stub/NSSA flag is set on one router and not set on another router
- Same Router ID configured on both routers
- Different network-type configured under interfaces
- Neighbor command is not configured on remote router with broadcast
- OSPF neigborship is not build on primary address
- No Corrupted OSPF packet received.
- passive interface is not configured under “router ospf”.
- Virtual -link is not configured over stub area.
2. OSPF stuck in INIT (one way hello)
- Multicast is broken or layer 2 problem.
- Access-list is blocking ospf multicast address.
- OSPF hello packet getting NAT translated.
- Layer 1/2 is broken.
- Unplugged cable
- Loose cable
- Bad cable
- Bad transceiver
- Bad port
- Bad interface card
- Layer 2 problem at telco in case of a WAN link
- Missing clock statement in case of back-to-back serial connection
3. OSPF stuck in 2-WAY
- Normal on ethernet broadcast.
- Layer 2 is broken.
- All routers are configured with priority 0 so there will not be any election.
4. OSPF stuck in EXSTART/EXCHANGE
- MTU mismatch between neighbor
- Duplicate router-ID between routers.
- Packet loss can also cause to stuck.
- Access-list is blocking unicast communication between router.
5. OSPF stuck in LOADING
- Neigbor is sending bad packet or corrupt packet due to memory.
- LS request packet is not accepting by neighbor and ignoring.
OSPF Neighbor Issues:
You should be aware of the fact that OSPF calculates the shortest path for data by getting information from the nearby routers to figure out which path is the shortest for the data to travel. This connection with the neighboring devices is very important as the data is transmitted due to the information sent from one router to the other.
However, if there is some problem with the connection between two devices, OSPF will not be able to identify the shortest path. This could lead to delays in the transfer of data and reduced speed of network.
In order to troubleshoot this problem, you need to make sure that all of the requirements which are responsible for the establishment of a connection between two routers are met.
Firstly, you must make sure that both of the OSPF devices such as routers are on the same subnet. This could be done by checking their subnet mask to see if it is same for both.
In addition to this, both of these devices are supposed to be in the same area so they are able to make a connection. Most importantly, you should check whether these OSPF devices have been provided their unique Router-IDs. These IDs are important to identify each router separately in a network. Once of all these conditions have been checked and resolved, OSPF will start to function again normally.
OSPF Routing Table Issues
OSPF makes use of routing tables to identify the shortest distance for data to travel. The routing table comprises of all the information regarding the position of each router, distance between each router and sometimes the direction of each router from the other one as well. OSPF makes use of this information to calculate the shortest route for data to reach its destination. However, sometimes the routing details from the routing table are deleted. These details can relate to the external as well as the internal routes. Under such conditions, the OSPF is not able to function properly.
In order to eliminate this issue, one first needs to identify which routes have been deleted or have been damaged in the routing table. If all of the OSPF routes have been deleted from the routing table, the problem is quite crucial. In this case, you will have to carry out a full adjacency check. If only the external routes have been deleted, routes which have been originated from another process during routing, you need to carry out an external LSA check. If the summary routes, routes which originate from another area, or NSSA routes have been deleted. It is crucial that the routing tables are repaired in the shortest possible time so that the network could get back to normal and OSPF could start working properly again.
OSPF INIT state issue
INIT state means that one of the router is able to send OSPF “hellos” to the neighboring router, but the neighboring router is not able to send back “hellos”. This could lead to the problem of communication between the two routers which would prevent OSPF from performing its task. If this problem arises, you first need to check if the OSPF authentication is being used in both of these devices. You will have to make use of the “show IP OSPF interface” command in order to check this. If you figure out that the same authentication is being used, then you should check whether the same authentication keys are used by both of these devices. However, if you find out that the authentication type on both of these devices is not same, you should check whether the physical cabling has been done properly. In this regard, you should also check if the switch settings have been done properly. If you find issues in any of these steps, you should immediately resolve them as mentioned.
The primary purpose of Access Control List (ACL) is to filter the data when it passes through a router. However, the process of ACL interferes with OSPF and prevents it from working properly. Hence, you need to check whether any of the routers are configured with ACLs. You could check this by using the command “show ip interface”. If you find out that if ACL is configured in any of the devices or routers, you should immediately disable the ACL to check if the OSPF starts working fine again. If it does, then you could reconfigure the ACL settings to ensure that it does not interfere with OSPF anymore.
OSPF PACKET TYPES
There five different types of packet
- Database Description
- Link State Request
- Link State Update
- Link State Acknowledgment
I will start with OSPF hello packet as this is first packet that will be send as soon as we enable OSPF on an interface.
The Hello protocol serves several purposes:
- It is the means by which neighbors are discovered.
- It advertises several parameters on which two routers must agree before they can become neighbors.
- Hello packets act as keepalives between neighbors.
- It ensures bi-directional communication between neighbors.
- It elects Designated Routers (DRs) and Backup Designated Routers (BDRs) on Broadcast and Nonbroadcast Multiaccess (NBMA) networks.
OSPF-speaking routers periodically send a Hello packet out each OSPF-enabled interface. This period is known as the HelloInterval and is configured on a per interface basis. Since OSPF is an open standard every vendor uses the same hello and dead time interval ie hello packets every 10 seconds (broadcast and point-to-point networks) and 30 seconds (nonbroadcast multiple access (NBMA) networks) and dead interval is four times the hello.
Packet structure of hello packet
Network Mask: is the address mask of the interface from which the packet was sent. If this mask does not match the mask of the interface on which the packet is received, the packet will be dropped. This technique ensures that routers will become neighbors only if they agree on the exact address of their shared network.
Hello Interval: as discussed earlier, is the period, in seconds, between transmissions of Hello packets on the interface. If the sending and receiving routers don’t have the same value for this parameter, they will not establish a neighbor relationship.
Options: This field is included in the Hello packet to ensure that neighbors have compatible capabilities. A router may reject a neighbor because of a capabilities mismatch.
Router Priority is used in the election of the DR and BDR. If set to zero, the originating router is ineligible to become the DR or BDR.
Dead Interval: is the number of seconds the originating router will wait for a Hello from a neighbor before declaring the neighbor dead. If a Hello is received in which this number does not match the Router DeadInterval of the receiving interface, the packet will be dropped. This technique ensures that neighbors agree on this parameter.
Designated Router: is the IP address of the interface of the DR on the network (not its Router ID). During the DR election process, this may only be the originating router’s idea of the DR, not the finally elected DR. If there is no DR (because one has not been elected or because the network type does not require DRs), this field will be set to 0.0.0.0.
Backup DR: is the IP address of the interface of the BDR on the network. Again, during the DR election process, this may only be the originating router’s idea of the BDR. If there is no BDR, this field is set to 0.0.0.0.
Neighbor: is a recurring field that lists all neighbors on the network from which the originating router has received a valid Hello in the past RouterDeadInterval.
Packet capture of HELLO packet
Lets just quickly analyze the packet capture of an hello packet which is received from its neighbour, you can see that OSPF hellos are sent on the multicast address 18.104.22.168, correspondingly it uses 01:00:5e:00:00:05 as the multicast mac address for layer 2 resolution.
Moving into the actual hello packet header you will see that its OSPF version 2 which means its using native ipv4 communication, its message type is HELLO PACKET, packet length is 48 , router id is 22.214.171.124 , area id is 0, packet checksum , authentication details (here you can see that you can actually see the passwords (cleartex) because i am using cleartext authentication ,
If i would have used MD5 authentication we will not see the password because it will be encrypted) , network mask , hello/dead interval which is 10 and 40, DR /BDR and active neighbour .
It makes more sense to show you a capture where the DR/BDR election is in progress.
Here you can see that BDR field is empty and DR field is 10.0.0.2 , in fact If there is no DR/BDR, an election is held in which the router with the highest priority becomes the BDR. If more than one router has the same priority, the one with the numerically highest Router ID wins. If there is no active DR, the BDR is promoted to DR and a new election is held for the BDR.
It should be noted that the priority can influence an election, but will not override an active DR or BDR. That is, if a router with a higher priority becomes active after a DR and BDR have been elected, the new router will not replace either of them.
The Database Description packet is used when an adjacency is being established .The primary purpose of the DD packet is to describe some or all of the LSAs in the originator’s database so that the receiver can determine whether it has a matching LSA in its own database. This is done by listing only the headers of the LSAs. Because multiple DD packets may be exchanged during this process, flags are included for managing the exchange via a master/slave polling relationship.
Interface MTU: Interface MTU is the size, in octets, of the largest IP packet that can be sent out the originator’s interface without fragmentation. This field will be set to 0x0000 when the packet is sent over virtual links.
Options: The field is included in the Database Description packet so that a router may choose not to forward certain LSAs to a neighbor that doesn’t support the necessary capabilities.
I-bit: Initial bit, is set to 1 when the packet is the initial packet in series of DD packets. Subsequent DD
packets will have I-bit = 0.
M-bit: More bit, is set to 1 to indicate that the packet is not the last in a series of DD packets. The last
DD packet will have M-bit = 0.
MS-bit: Master/Slave bit, is set to 1 to indicate that the originator is the master (that is, is in control of the polling process) during a database synchronization. The slave will have MS-bit = 0.
DD Sequence Number: It ensures that the full sequence of DD packets are received in the database synchronization process. The sequence number will be set by the master to some unique value in the first DD packet, and the sequence will be incremented in subsequent packets.
LSA Headers: It list some or all of the headers of the LSAs in the originator’s link state database. See “The Link State Header,” for a full description of the LSA header; the header contains enough information to uniquely identify the LSA and the particular instance of the LSA. Packet capture of Database Description packet
Link State Request
As Database Description packets are received during the database synchronization process, a router will take note of any listed LSAs that are not in its database or are more recent than its own LSA. These LSAs are recorded in the Link State Request list. The router will then send one or more Link State Request packets asking the neighbor for its copy of the LSA.
Link State Type is the LS type number, which identifies the LSA as a router LSA, network LSA, and so on.
Link State ID is a type-dependent field of the LSA header.
Advertising Router is the Router ID of the router which originated the LSA.
Packet capture of Link State Request:
Link State Update
The Link State Update packet is used in the flooding of LSAs and to send LSAs in response to Link State Requests. Recall that OSPF packets do not leave the network on which they were originated. Consequently, a Link State Update packet, carrying one or many LSAs, only carries the LSAs only one hop further from their originating router. The receiving neighbor is responsible for re-encapsulating the appropriate LSAs in new LS Update packets for further flooding.
Number of LSAs specifies the number of LSAs included in this packet.
LSAs are the full LSAs as described in OSPF LSA formats. Each update may carry multiple LSAs, up to the maximum packet size allowed on the link.
Packet capture of Link State Update packet
Link state update is a very important packet type in terms of troubleshooting , couple of things that you need to remember is that LSU are received on the multicast address 126.96.36.199 since there was a Link state request , requesting for LSA , R1 will receive an unicast LSU . You will also see the number of LAS’s and the associated links in the LSA.
Link State Acknowledgment
Link State Acknowledgment packets are used to make the flooding of LSAs reliable. Each LSA received by a router from a neighbor must be explicitly acknowledged in a Link State Acknowledgment packet.
Packet capture of Link State Acknowledgment packet
- OSPF has protocol ID 89for all its packets.
- If we use debug ip ospf packetwe can look at the OSPF packet on our router. Let’s look at the different fields we have:
- V:2 stands for OSPF version 2. If you are running IPv6 you’ll version 3.
- T:1 stands for OSPF packet number 1 which is a hello packet. I’m going to show you the different packets in a bit.
- L:48is the packet length in bytes. This hello packet seems to be 48 bytes.
- RID 188.8.131.52 is the Router ID.
- AID is the area ID in dotted decimal. You can write the area in decimal (area 0) or dotted decimal (area 0.0.0.0).
- CHK 4D40 is the checksum of this OSPF packet so we can check if the packet is corrupt or not.
- AUT:0 is the authentication type. You have 3 options:
- 0 = no authentication
- 1 = clear text
- 2 = MD5
- AUK:If you enable authentication you’ll see some information here.
Cisco Commands to use for troubleshooting:
|Reason for Neighbor Adjacency Problem||Commands for Diagnosing the Problem|
| To view OSPF information including:
The process ID
|show ip ospf|
|To view interfaces that are running OSPF including the following information:
Interface status and IP address assigned to the interface
|show ip ospf interface|
|To view information about neighbor OSPF routers including:
Router ID of the neighbor router
|show ip ospf Neighbor|
|To view OSPF configuration information such as:
The OSPF process ID
|show ip protocols|
|MTU mismatch between neighboring interfaces.||show interface <int-type><int-num>|
|OSPF area-type is stub on one neighbor, but the adjoining neighbor in the same area is not configured for stub.||show ip ospf interface|
|OSPF neighbors have duplicate Router IDs.||show ip ospf interface|
|OSPF is configured on the secondary network of the neighbor, but not on the primary network. This is an illegal configuration which prevents OSPF from being enabled on the interface.||show ip ospf interface|
|OSPF HELLOs are not processed due to a lack of resources, such as high CPU utilization or not enough memory.||show memory summary show memory processor|
|An underlying Layer problem prevents OSPF HELLOs from being received.||show interface|
|To view debugging information about hello exchanges, DR selection information, SPF calculation, and errors related to negotiating adjacency.
Use debug ip ospf hello to view only hello packet information.
|debug ip ospf events|
|Displays information contained in each OSPF packets such as area id and router id.||debug ip ospf packet|
|Shows and Area Border Routers (ABRs) routing table.||show ip ospf border-routers|
|Shows the state of adjacency and the neighbor routers ID||show ip ospf neighbor
|Displays information on the Area to which it is assigned. Can be used to display information on the Area Border Router or Autonomous System Boundary Router.||show ip ospf process-id
|Shows routers link state and network link states as maintained in the routers database.||Show ip ospf database|
|show ospf neighbor|
|show ospf neighbor extensive|
|clear ospf neighbor all|
|show ospf statistics|
|show ospf interface|
|show ospf interface extensive|
|show route protocol ospf|
|show ospf database|
|show ospf database router advertising-router <x.x.x.x>|
- Highest IP address ABR routes convert the type7 into type 5.
- Default route is not generated by default in area nssa unless “are nssa <> default originate ” configured.
- Totally stubby NSSA area generate the default route by default.
- DR/BDR does not support the preempt therefore if DR fails BDR will become DR new BDR will be connected. DR does not become DR even when it is high priority.
- With “ip ospf priority 0” router does not participate in DR/BDR.
- OSPF behaves as distance vector protocol when multiple area in use.
- Highest priority/IP address becomes the DR/BDR.
- OSPF hellos are always send from primary interface.
Most error messages shown in the debug output adequately describe the nature of the problem.
Shown below are some errors that display with the debug ip ospf events command:
|OSPF: mismatched hello parameters from 10.0.0.1
OSPF: Dead R 20 C 40, Hello R 5 C 5
Mask R 255.255.255.0 C 255.255.255.0
|Hello timer, dead timer, or subnet mask mismatch detected.
In this example, the dead timer intervals do not match:
R (received) = 20, C (configured) = 40
|OSPF: hello packet with mismatched E bit||Area types (not area numbers) configured on each router do not match.
The E bit is also called the stub area flag.
|Neighbor Down: Dead timer expired||An expected hello timer has not been received.
When the dead timer reaches 0, it is assumed that the neighbor router has gone down.
The dead timer resets itself each time a hello packet is received.
Document to refer for more details:
RFC 2328 OSPFv2
RFC 2178 OSPF
RFC 1583 OSPF v2
RFC 1587 OSPF NSSA
RFC 1745 OSPF Interactions
RFC 1765 OSPF Database Overflow
RFC 1850 OSPF Traps
RFC 2154 OSPF w/Digital Signatures (Password, MD-5)
RFC 1850 OSPF v2 MIB
RFC 1997 Communities Attributes
RFC 2385 TCP MD5
RFC 2370 OSPF Opaque LSA Option