TCP

TCP Talk Series- V

Today we are going to cover below topics:

  • TCP Troubleshooting and various cases
  • Traceroute Troubleshooting
  • Salient features of TCP
  • Congestion avoidance

TROUBLESHOOTING TCP ISSUES :

Check the Connection at the remote side
Getting TCP RESET. Then we need to check the TTL of the IP packet. By this we will able to figure out from where the RST is coming from. Routed packet or not.

54

TOP 10 TCP Issues :

++ no TCB information and no Socket info

  1. Packet Loss
  2. Client Server and wire latency
  3. Window scaling issues
  4. Service response issues and application behaviour
  5. Network design issues
  6. Path issues (Such as QoS)
  7. Itty Bitty Stinking packets (Low MSS Value)
  8. Fragmentation
  9. Timing Problems
  10. Interconnecting devices

ROUND TRIP TIME (RTT) can give another indication that the issue is on the network side.
CASE 1 : Connectivity problems between nodes of a TCP/IP network are usually the result of incorrect routing information caused.

ROGUE DNS SERVER : If the client request information form an unauthorised DNS server that is incorrectly configured, the information returned is not reliable.

HARDWARE FAILURE : If DNS server has a hardware failure, the DNS record maintained by the DNS server are not updated as expected, potentially caring incorrect host-to-ip mappings to be sent to client.

BLOCKED TCP/IP ports : Application layer protocol or services are blocked, network communication are blocked.

INCORRECT CLIENT TCP/IP CONFIGURATION :

Ping test verify the connectivity between two nodes on a TCP/ IP network
To rule out DNS problem Ping xxx.com, does DNS server resolved the IP address even though no ping response was returned.

If the name resolves to an IP address, there is a good chance that the destination is accessible via another application layer protocol or service. For example if ping request to http://www.xxx.com time out, connectivity to http://xxx.com typically succeeds because this destination is a web server that accepts browser requests over port 80.

If the ping request is successful and reply is returned, verify that replies within reasonable time (<250ms). If the ping response time exceed 250 ms additional troubleshooting steps may be required.

Note :
When troubleshooting network connectivity using traceroute utility, it is often beneficial to run the command on both direction, from the client to server and then from server back to the client.
If any significant latency discrepancies in either path.

POLICY CHECK ON EACH ROUTER : Check if any policy configured which cause Layer 4 packet to drop.

TRACEROUTE TROUBLESHOOTING :

Trace route probe packets can take many forms. In fact essentially any IP packet can be used in a Traceroute probe, since the only absolute requirement is that the packet has an incrementing TTL field with each probe.

Two other practical consideration are that the probe packet should not be blocked by firewall and that the final destination should return a reply to the probe packet so the
traceroute implementation knows it has reached the end.

CLASSIC UNIX TRACEROUTE : Uses UDP packets with the destination port starting at 33434 and incrementing by 1 with each probe. Typically default is 3 probes.
The UDP destination port number is used to identify which probe the ICMP response is talking about. When it reach the final destination it returns the Destination
Unreachable.

Many modern Traceroute implementation allow the user to spiffy
UDP, ICMP or TCP probes. Port number 33434 uses the starting port for traceroute comes
fro adding the number (2**15)

Windows Traceroute (or more specifically, tracer.exe) is notable for its use of ICMP Echo Request probes, rather than the UDP probes of classes UNIX implantation. When the probe reach the final destination ICMP Echo reply returned, indicating the end the end of Traceroute.

NOTE : Traceroute latency is that it reports the RTT even though it only shows you the forward path. Any delays which occur on the reverse path (The path taken by ICMP TTL exceed message, between the router which generated the ICMP and the original Traceroute sender) will be included in the Round trip latency calculation, even
though these hops will be invisible in the Traceroute output.

There are three main types of network induced latency which are likely to be observed in Traceroute,

  • Serialization Delay
  • Queuing Delay
  • Propagation Delay

Serialization Delay : In most modern router architecture it is generally not possible to began transmitting the packet to the egress interface until the entire packet has been received from the ingress interface. The latency caused by this movement of data in packet sized chunks is called Serialization delay.

Calculation : 900 Bytes of Data in 10 Mbps link 900 * 8 bits / 10M bits/sec = 0.72 ms of
serialisation delay.

Queuing Delay : To understand queue delay, first you must understand the nature of interface utilization. For example : 1 Gig Port may be said to be 50% utilised when doing 500 Mbps, but what this actually means is 50% utilised over some period of time. At any given instant, an interface can only be either transmitting (100% utilised), or not
transmitting (0% utilised) When a packet is routed to an interface is currently in use,
the packet must be queued.

Propagation Delay : is caused by the time that the packet spends “in flight” on the wire, during which the signal is propagating over the physical medium.

PRIORITISATION AND RATE LIMITING :
The latency values reported by Traceroute are based on the following 3 components :

  • The time taken for the probe packet to reach a specific router
  • The time taken for that router to generate an ICMP TTL Exceed packets
  • The time taken for the ICMP TTL Exceed packet to return to the traceroute source.

ASYMMETRIC ROUTING :

One of the most basic concept of routing on the internet is that there is absolutely no guarantee of symmetrical routing of traffic flowing between the same end-point but in opposite direction. Trace route capable of showing the forward path between the source and destination you are trying to probe.

LOAD BALANCING ACROSS MULTIPLE PATHS :

Within complex IP network, there is often a need to load balance traffic across multiple physical links between a particular source and destination. There are two primary ways to accomplish this. Layer 2 and Layer 3.
Layer 2 based LAG are invisible to Traceroute, but layer 3 based ECMP is often detectable. When multiple paths are included in Traceroute result, it can significantly increase the difficulty of correctly interpreting the results and diagnosing any potential
problems.

SALIENT FEATURE OF TCP :

Piggybacking of ACK :

The ACK for the last packet need not be sent as a new packet, but gets a free ride on the next outgoing data frame. This technique is temporarily delaying outgoing ACK so that they can be hooked on the next outgoing data frame is known as piggybacking.

But ACK can’t be delayed for a long time if receiver (of the packet to be ACK) does not have any data to send.

Flow and congestion control :
TCP takes care of flow control by ensuring that both ends have enough resources and both can handle the speed of data transfer of each other so that none of them gets overloaded with data. The term congestion control is used in almost the same content except that resources and speed context except that resources and speed of each router is also then care.

Multiplexing and Demultiplexing

WINDOW SIZE :
TCP Zero window is when the Window size in a Machine remains at zero for a specified amount of time. This means that a client is not able to receive further information at the moment and the TCP transmission is halted until it can process the information in its received buffer. TCP WINDOW Size is the amount of information that machine can
receive during a TCP session and still be able to process the data.

CONGESTION CONTROL :

DETECTION OF CONGESTION IN TCP :
The primary mechanism TCP has available to combat packet loss is retransmission induced either by a retransmission timer expired or by fast retransmit algo. Their is no explicit signalling about congestion. Instead if a typical TCP is to react somehow to congestion, it must first conclude that some response is required.

SLOWING DOWN A TCP SENDER :
Window size field in the TCP header is used to signal a sender to adjust its window based on the availability of buffer space at the receiver. For a Sender to slow down if either the receiver is too slow or the network is too slow.

This accomplish by introducing a window control variable at the sender that is base on an estimate of the network capacity and ensuring that the senders window size never exceeds the minimum of two.

The new Value used to hold the estimate of the network available capacity is called the Congestion Window. (CWND) The sender actual (usable) window W is then written as the
minimum of the receivers advertised window and and the congestion window.

W = min(CWND, AWND)
AWND : Receiver Window
CWND : Congestion Window

With this relationship, the TCP sender is not permitted to have more than W ACK packet or outstanding in the network. The total number of data a sender has introduced into the
network for which it has not yet received an ACK is something called as Flight Size. Which always less than or equal to W.

W is either maintained in either packet or bytes units.

NOTE : When TCP does not make use of Selective ACK, the restriction on W means that the sender is not permitted to send a segment with a sequence number greater than the sum of the highest ACK sequence number and the value of W. SACK TCP sender treats W somewhat differently, using it as an overall limit to the flight size.

The both CWND and AWND changes over time. In addition because of the lack of explicit signals the “correct” value of CWND is generally not directly available to the sending TCP.

Thus all the values W, CWND and AWND must be empirically determined and dynamically updated.

The Classic Algo :
TCP learns the value for AWND with one packet exchange to the receiver, but without any explicit signaling, the only obvious way it has to learn a goo value for CWND is to try sending data at a faster rates until it experience a packet drop. TCP generally uses one algo to avoid starting so fast when it starts up to get to steady state. It uses a different one once it is in steady state.

55

TCP congestion control operates on a principle of conservation of packets.
Packet (Pb) are stretched out in time as they are sent from sender to receiver over links with constrained capacity.
As they are received to sender become spaced apart (Pr), ACK are generated (Ar) which return to the sender. ACK traveling from receiver to sender become spaced out (Ab) in relation to the inter packet spacing of the packets.
When ACK reach he sender (As) their arrivals provide a signal or ACK clock used to tell the sender it is time to send more. In steady state the overall system is said to be self-clocked.
Two Main Algorithms of TCP :

  • Slow Start
  • Congestion Window

SLOW START :

Slow Start algorithm is executed when a new TCP connection is created.
Loss detected due to retransmission time out.
Slow start help TCP find a value for CWND before probing for more available bandwidth using congestion avoidance and to establish the ACK clock. Beginning transmission into a network with unknown conditions required TCP to slowly probe the network to determine the available capacity, in order to congestion the network with an appropriate large trust of data.
Slow start algo is used for this purpose at the beginning of a transfer or after preparing loss detected by the retransmission timer.

TCP begins in slow start by sending a certain number of segments (after the SYN exchanges) called IW (Initial Window)

The IW = SMSS
IW = 2*SMSS
IW = 3*SMSS
IW = 4*SMSS

Assignment of IW may allow several packet in the initial window.

A TCP just starting out begins its connection, then with CWND= 1 SMSS meaning the initial usable Window W is also equal SMSS.
ACK to be sent another segment.
CWND by min(N,SMSS) for each good ACK received. After one segment is ACKed the CWND value is ordinarily increased by 2. 2 increases to 4, 4 to 8 and so on.

TCP connection where the receivers advertised window is very large
CWND is the primary governor of sending rate. This value grows exponentially fast in the RTT of the connection.

TCP throughput rate is proportional to W/RTT. When this happens CWND is reduced substantially (to half of its former value). This is the point at which TCP switched from operating in slow start to operating in congestion avoidance.

CONGESTION AVOIDANCE :
It increases CWND fairly rapidly and helps to establish a value for ssthresh. Possibly that more network capacity may become available for a connection. TCP implements the congestion avoidance algo. Once SSHTHRESH is established and CWND is at least at this level. The Slow start grows it exponentially with respect to time. This function is also called additive increase.Connection sends four packets (top), four ACK are returned CWND to grow slightly.

Selecting between Slow start and Congestion Avoidance :

56

++++++++++++++++++++++++++++++++++++++++++++++++++++++
Initially:
– cwnd = 1*MSS
– ssthresh = very high

If a new ACK comes:

  • if cwnd < ssthresh update cwnd according to slow start
  • if cwnd > ssthresh update cwnd according to congestion avoidance
  • If cwnd = ssthresh either of the two
  • If timeout (i.e. loss) :
    – ssthresh = flight size/2;
    – cwnd = 1*MSS

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

58

Concept:
After fast retransmit, reduce CWND by half, and continue sending segments at this reduced level.
Problems:
Sender has too many outstanding segments.
How does sender transmit packets on a dupACK? Need to use a “trick” – inflate CWND.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

With two losses in a window, Reno will occasionally timeout.
With three losses in a window, Reno will usually timeout.
With four losses in a window, Reno is guaranteed to timeout!
With three or more losses in a window, Tahoe typically out performs Reno!
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Both together can’t be run simultaneously.

CWND < SSTHRES || SLOW START
CWND < SSTHRES || CONGESTION AVOIDANCE

When both are equal either can be used.

Initial Value of SSTHRES may be set arbitrarily high (AWND or HIGHER)
Which causes TCP to start with Slow Start. When transmission occurs, cause by either a retransmission timeout or the execution of fast retransmit, sshthresh is
updated
SSTHRESH = MAX(Flight Size.2, 2*SMSS)

TCP assumes that the operating window must have been too large for the network to handle. Optimal window size is accompanied by altering SSHTHRESH to be about halfed of what the current window size.

Usually lowering the SSTHRESH.
We call if an entire window worth of data is successfully exchanged, the value of CWND is allowed to increase by approx 1 SMSS.

TAHOE, RENO and FAST RECOVERY :

TAHEO was implemented by simply reducing CWND to its starting value (1SMSS at a time) upon any loss, forcing the connection to slow start until CWND grew to the value of SSTHRESH
It will be a problem for underutilise the available bandwidth. Ultimately if packet loss is detected by duplicate ACK (invoking fast retransmit) CWND is instead reset to the last
value of SSTHRESH instead of only 1 SMSS.
Fast recovery allows CWND to temporarily grow 1 SMSS for each ACK received while recovering.
TAHOE refers to the TCP congestion algo which was suggested by VAN Jacobson.
TCP is based on the principle of conservation f packets. If the connection is running at a available bandwidth capacity then a packet is not injected into the network unless a packet taken out as well.

TCP implements this principle by using ACK to clock the outgoing packet bz an ACK means that a packet was taken off the wire by the receiver. It also maintains congestion window CWD to reflect the network capacity.

Determination of the available bandwidth

How to react to congestion

TAHOE suggest that whatever condition, if their is a packet loss it should go through a procedure of Slow Start. The reason for this procedure is that an initial burst might
overwhelm the network and the connection might never get started.

Slow start suggest that the Window to 1 and then for each ACK received it increase the CWD by 1. SO in the first RTT we sent 1 packet in the second we sen 2 packet and their we sed 4 packet. It will increase exponential until it lose a packet
which is a sign of congestion.

When we encounter a congestion we decreases our sending rate and we reduce congestion window to one. And start over again.

NOTE
: TAHOE detects packet losses by timeouts. It usual implementation repeated interrupts are expensive so we have coarse grain timeouts which occasionally checks for timeouts.
Thus it might be some time before we notice a packet loss and
then retransmit that packet. For congestion avoidance Tahoe uses Additive Increase
multiplicative decrease.
A packet loss is taken as a sign of congestion and Tahoe saves the half of the current window as a threshold value. It then set CWD auto one and starts slow start until it reaches the threshold value. After that it increments linearly until it encounters a packet loss. Thus it increase it window slowly as it approaches the bandwidth capacity.

PROBLEM WITH TAHOE :
The problem with TAHOE is that it take a complete timeout interval to detect a packet loss and in fact, in most implementation it takes even longer because of the coarse
grain timeout. Also since it doesn’t send immediate ACK it sends cumulative ACK, therefore it follows a “go” back “n” approach. Thus every time a packet is lost it waits for a timeout and the pipeline is emptied. This offers a major cost.

TCP RENO :

The RENO retains the basic principle of Tahoe, such as slow start and the coarse grain re transmit timer. However it add some intelligence over it s that lost packet are detected earlier and the pipeline is not emptied every time a packet is lost.

Reno requires that we receive immediate ACK whenever a segment is received. The logic behind this is that whenever we receive a duplicate ack, then his duplicate ack could have been received next segment in seq expected has been delayed in the network and the segment reached there out of order or else that the packet is lost. If we receive a number of duplicate ACK then that means that sufficient time has passed and even
if the segment had taken a longer path, it should have gotten to be receiver by now.

There is a very high probability that it was lost. So Reno suggest an algo called Fast Re-Transmit. Whenever we receive 3 DUP ACK we will do fast retransmit. We
will not wait for the RTO.

Also another modification is that, it does not reduce the congestion window to 1. Since this empties the pipe, it enters algo which is called FAST RETRANSMIT.

-> Each time we receive 3 Duplicate ACK we take that to mean that the segment was lost and we retransmit the segment immediately and enter Fast Recovery.

-> Set SSTHRESH to half the current window size and also set CWND to the same value.

ssthresh = max (FlightSize / 2, 2*SMSS)

-> Each DUP ACK receive increase the CWND by 1. If increase greater than the amount of pipe then transmit a new segment else wait.

PROBLEM WITH RENO :
-> Reno perform very well over TCP when the packet losses are small. But when we have multiple packet losses in one window then RENO doesn’t perform too well and its performance is almost the same as TAHEO under conditions of high packet loss. The reason is that it can only detect one packet loss.

NEW RENO :

New RENO is a slight modification over TCP RENO. It is able to detect multiple packet losses and thus is much more efficient that RENO in the event of multiple packet loss.
Like RENO it enters into FAST-RETRANSMIT when it receives multiple duplicate packets, however it differ from RENO in that it doesnt exit fast recover until all the data which was outstanding at the tie it entered fast recovery is ACK.

Thus it overcomes the problem faced by RENO of reducing the CWND multiple times.
The fast retransmit phase is the same as in RENO. The difference in the fast recovery which allows multiple retransmit in new reno.
Whenever new-Reno enters fast recovery it notes that maximum segment which is outstanding. The fast recovery phase proceeds as in Reno, however when a fresh ACK is received then there are two cases.

1. If the ACK all the segment which were outstanding when we entered fast recovery the it exits fast recovery and set CWND to SSTHRESH and continues congestion avoidance like TAHEO.

2. If the ACK is partial ACK then it deduces that the next segment in line was lost and it retanmits that segment and sets the number of DUP ACK received zero. It exits Fast Recovery when all the data in the window is ACK.

Problem :
NEW RENO suffers from the fact that its take one RTT to detect each packet loss. When the ACK for the first re transmitted segment is received only then can we deduce which other segment was lost.

SACK :

TCP with SACK is an extension of TCP reno and TCP new RENO namely detection of multiple lost packets and re transmitted of more than one lost packet per RTT.

SACK retains the SLOW START and FAST RETRANSMIT parts of RENO.

It also has the coarse grained timeout of Tahoe to fall back on, incase a packet loss is not detected by the modified algo. Fast retransmit invoked because of receipt of a third
Duplicate ACK :

SSTHRESH is updated to no more than the value give [ssthresh = max(flight/2, 2*SMSS)

The CWND set to (SSTHRESH + 3*SMSS)

CWND is temporarily increased by SMSS for each delicate ACK received

When a good ACK is received, CWND is rest back to SSHTHRVH.

These steps are also where Multipliatve decrease occurs, CWND is ordinarily multiplied by some value (.5) to form its new value.

Step 3 continues the inflation process, allowing the sender to send additional pacers (Assuming and is not exceed ) TCP assumes to have recovered so the temporary inflation is removed ( and this step is called deflation)

When a new connection is started and when retransmission timeout occurs.
The initial value of CWD is set to the restart window. RW min(IW, CWND)

DUPLICATE ACK :

The purpose of this ACK is to inform the sender that a segment was received out-of-order and which seq number is expected. From the sender perspective, DUP ACK can be cause by a number of network problem.

From the senders perspective DUP ACK can be caused by dropped segments. In this case all segments after the dropped segment will trigger DUP ACK.

Second duplicate ACK can be caused by re-ordering of data segment by the network.

Finally duplicate ACK can be caused by replication or data segment by the network.

SUMMARY TCP BEHAVIOUR :

5960

When entering slow start, if connection is new,

ssthresh = arbitrarily large value
cwnd = 1.

else,

ssthresh = max(flight size/2, 2*MSS)
cwnd = 1.

In slow start ++cwnd on new ACK

When entering either fast recovery or modified fast recovery:

ssthresh = max(flight size/2, 2*MSS)
cwnd = ssthresh.

In congestion avoidance

cwnd += 1*MSS per RTT

 

So with this, we have completed the TCP talk series. I have tried to cover every tits and bits of TCP. In case you want any topic to be covered in detail, do let me know.

Happy Learning…

TCP Talk Series

  1. TCP Talk Series – I
  2. TCP Talk Series- II
  3. TCP Talk Series – III
  4. TCP Talk Series -IV
  5. TCP Talk Series – V
Advertisements

Categories: TCP

4 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s