Our work with messaging systems at 29West as Latency Busters® has given us many opportunities to investigate sources of latency in messaging. This section lists the sources we most frequently encounter. It contains links to other sections where latency sources are discussed in more detail. Sources are listed in approximate order of decreasing contribution to worst-case system latency. Of course, the amount of latency due to each source in each system will be different.
Some sources of latency impact every message sent while others impact only some messages. Some sources have separate fixed and variable components. The fixed component forms a lower bound on latency for all messages while the variable component contributes to the variance in latency between messages. Where possible, latency sources will be characterized by when they occur and their variability.
Many messaging systems contain components that add little value to the overall system. The time taken for communication between these components often adds up to a substantial fraction of the total system latency. Design problems like this are best addressed by removing or combining components that add little value. 29West adopted a "no daemons" design philosophy in its LBM product thereby eliminating latency and design complexity often found in other leading messaging systems.
One of the compelling benefits of managed languages like Java and C# is that programmers need not worry about tracking reference counts and freeing memory occupied by unused objects. This pushes the burden of reference counting to the language run-time code (Java's JVM or C#'s CLR). A common source of intermittent but significant latency for messaging systems involving managed languages is garbage collection in the language run-time code.
The latency happens when the run-time systems stops execution of all user code while counting references. More modern run-time systems allow execution even while garbage collection is happening (e.g. mark and sweep techniques).
Garbage collection latency generally happens infrequently, but depending on the system, it can be significant when it does happen. It impacts all messages that arrive during collection and those that are queued as a result of the latency.
It is common for messaging applications to expect loss-free message delivery even if the network layer drops packets. This implies that either the messaging layer itself or a transport layer protocol under it must repair the loss. The time taken to discover the loss, request retransmission, and the arrival of the retransmission all contribute to retransmission latency. See Section 15 for more information on multicast retransmissions.
Retransmission latency typically only happens when physical- or network-layer loss requires retransmission. This is infrequent in well-managed networks. In the event that retransmission is requested by a receiver detecting loss, the fixed component of retransmission latency is the RTT from source to receiver while the remainder is variable.
Messages may arrive out of order at a receiver due to network loss or simply because the network has multiple paths allowing some messages faster paths than others. When a message arrives ahead of some that were sent before it, it can be held until its predecessors also arrive or may be delivered immediately. The holding process allows applications to receive messages in order even when they arrive out of order, but it adds a reordering latency. LBM offers an arrival-order delivery feature that avoids this delay, but TCP offers no such option. See Section 2 for a deeper discussion of reordering latency in TCP.
Note that in the case where a lost message cannot be recovered, the reordering latency can become very large. Following loss, TCP uses an exponential retransmission algorithm which can lead to latencies measured in minutes. Bounded-reliability protocols like LBT-RM and LBT-RU allow control over the timers that detect unrecoverable loss. These timers can be set to move on much more quickly following unrecoverable loss.
Reordering latency only happens when messages arrive out of order. It is entirely variable with no fixed component. However, in cases where loss is the cause of reordering latency, reordering latency is the lesser of the retransmission latency or unrecoverable loss detection threshold.
Achieving low latency is almost always at odds with efficient use of resources. There is a fixed overhead associated with every network packet generated and every interrupt serviced. CPU time is required to generate and decode the packet. Network bandwidth is required for physical, network, and transport layer protocols. When message payloads are small relative to the network MTU, this overhead can be amortized over many messages by batching them together into a single packet. This provides a significant efficiency improvement over the simple, low-latency alternative of sending one message per packet.
Similarly, in the world of NIC hardware, a NIC can often be configured to interrupt the CPU for each network packet as it is received. This provides the lowest possible latency, but doesn't get very much work done for the fixed cost of servicing the interrupt. Many modern Gigabit Ethernet NICs have a feature that allows the NIC to delay interrupting the CPU until several packets have arrived, thus amortizing the fixed cost of servicing the interrupt over all of the packets serviced. This adds latency in the interest of efficiency and performance under load. See Section 18 for more information.
Batching latency may be small or insignificant if messages are sent very quickly. The most difficult trade offs have to be made when it can't be known in advance when the next message will be sent. Trade offs may be specified as a maximum latency that would be added before sending a network packet. Additionally, a minimum size required to trigger transmission of a network packet could be specified. LBM offers applications control over these batching parameters so that they can optimize the trade off between efficiency and latency.
Batching latency is variable depending on the batching control parameters and the time between messages.
Batching can happen in the transport layer as well as in the messaging layer. Nagle's algorithm is widely used in TCP to improve efficiency by delaying packet transmissions. See Section 2.4 for more information.
One of the critical resources a messaging system needs is CPU time. There is often some latency between when messaging code is ready to run and when it actually gets a CPU scheduled.
There are generally both fixed and variable components to CPU scheduling latency. The fixed component is the latency when a CPU is idle and can be immediately scheduled to messaging code following a device interrupt that makes it ready to run. See Section 20 for more background and measurements we've taken.
The variable component of CPU scheduling latency is often due to contention over CPU resources. If no idle CPUs are available when messaging code becomes ready to run, then the CPU scheduling latency will also include CPU contention latency. See Section 21 for more background and measurements we've taken.
Socket buffers are present in the OS kernel on both the sending and receiving ends. These buffers help to smooth out fluctuations in message generation and consumption rates. However, like any buffering, socket buffers can add latency to a messaging system. The worst-case latency can be computed by dividing the size of the socket buffer by the data rate flowing through the buffer.
Socket buffer latency can vary from near zero to the maximum computed above. The faster messages are produced or the more slowly they are consumed, the more likely they are to be delayed by socket buffering.
Switches and routers buffer partial or complete packets on queues before forwarding them. Such buffering helps to smooth out peaks in packet entrance rates that may briefly exceed the wire speed of an exit interface.
Desktop and consumer-grade switches tend to have little memory for buffering and hence are generally only willing to queue a packet or two for each exit interface. However, routers and infrastructure-grade switches have enough memory to buffer several dozen packets per exit interface. Each queued packet adds latency equal to the time needed to serialize it (see Section 17.10).
Network queuing latency is present whenever switches and routers are used. However, it is highly variable. On an idle network with cut-through switching, it could be just the serialization latency for the portion of a packet needed to determine its destination. On a congested routed network, it could be many dozen times the serialization latency of an MTU-sized packet.
Sometimes also called network admission control, this latency source is a combination of factors that may add latency before a packet is sent. The simplest case is where messages are being sent faster than the wire speed of the network. Latency may also be added before a packet is sent if there is contention for network bandwidth. (It is a common misconception that switched networks eliminate such contention. Switches do help, but the contention point often becomes getting packets out of the switch rather than into it.) See Section 4 for a possible cause of network access control latency.
Transport-layer protocols like LBT-RM may also add latency if an application attempts to send messages faster than the network can safely deliver them. The rate controls that impose this latency promise stable network operation in return. Indeed, most forms of network access control latency end up being beneficial by preventing unfair use or congestive collapse of the network.
Network access control latency should happen infrequently on modern networks where message generation rates are matched to the ability of the network to safely carry them. It can be highly variable and difficult to estimate because it may arise from the actions of others on the network.
Networks employ various techniques for serializing data so that it can be moved conveniently. Typically, a fixed-frequency clock coordinates the action of a sender and receiver. One bit is often transmitted for each beat of the clock. Serialization latency is due to the fact that a receiver cannot use a packet until its last bit has arrived.
The common DS0 communications line operates at 56 Kbps and hence adds 214 ms of serialization latency to a 1500-byte packet. A DS1 line ("T1") operates at 1.5 Mbps, adding 8 ms of latency. Even a 100 Mbps Ethernet adds 120 μs of latency on a 1500-byte packet.
Serialization latency should be constant for a given clock rate. It should be consistent across all messages.
Note: Serialization latency and the speed of light (discussed in the next section) can be easily visualized and contrasted using a Java applet. The linked web page uses the term "transmission delay" where we use the term "serialization latency." Similarly, it uses the term "propagation delay" where we use "speed of light."
186,000 miles per second--it's not just a good idea, it's the law!
Although often insignificant under one roof or even around a campus, latency due to the speed of light can be a significant issue in WAN communication. Signals travel through copper wires and optical fibers at only about 60% of their speed in a vacuum. Hence the 3,000 km trip from Chicago to San Francisco takes about 15 ms while the 6,400 km trip from Chicago to London takes about 33 ms.
As far as we know, the speed of light is a constant of the universe so we'd expect the latency it adds over a fixed path to also be constant.
Copyright 2004 - 2008 29West, Inc.