3. Congestion Control

Network congestion can happen when multiple applications have to compete for limited network bandwidth. As network utilization approaches 100%, data is delayed in queues and can be lost in switches and routers, thus requiring re-transmission. In its most severe form, a network can suffer congestion collapse, a self-reinforcing condition where more and more bandwidth is consumed by applications trying desperately to recover delayed or lost data. Reliable multicast protocols (like PGM) can sometimes contribute to congestion collapse by generating NAK storms (a.k.a. NAK implosion). We've seen cases where network throughput drops to near zero, with virtually all of the bandwidth consumed by NAKs and retransmitted data.

To avoid congestion collapse, protocols need to be designed to react to congestion in ways that don't make the problem worse. For example, the TCP protocol uses a variety of algorithms which basically slow down the sending application when congestion is detected. TCP is designed to spread the slowdown fairly evenly across all competing connections, thus giving each connection an equal share of the available network bandwidth.

Designers of latency-sensitive applications often want to avoid this "equal sharing". They want better control over how network bandwidth is allocated to applications so that highly time-critical data can get through quickly, albeit somewhat at the expense of less time-sensitive data. Many designers choose a UDP-based messaging system (unicast or multicast) to deliver their latency-sensitive data because UDP does not automatically slow down in the face of congestion. Unfortunately, as many designers have discovered, UDP-based messaging can cause instability in a congested network, sometimes leading to congestion collapse. Since network bandwidth is a finite resource, some measures need to be taken to maintain stability when there are more time-sensitive messages to send than can fit.

The 29West LBM product addresses this requirement through the use of rate limits. LBM's reliable multicast is immune to NAK storms due to its internal NAK-generation algorithms plus the ability to limit the rate of re-transmitted data independently from new data. These rate limits are critical to maintaining network stability in the face of congestion, while still giving the designer the ability to allocate bandwidth according to the relative time-sensitivity of the data being sent.

Copyright 2005 - 2006 29West, Inc. -- 29West Confidential