In our work at 29West with LBM, we have found that successful deployment of our reliable multicast (LBT-RM) and reliable unicast (LBT-RU) protocols often depends on familiarity with UDP buffering in operating systems. Our reliable unicast and reliable multicast protocols use UDP to achieve control of transport latency that would be impossible with TCP.
Much of what we've learned deploying our UDP-based protocols should be applicable to other high-performance work with UDP. We have collected here some of the background information that we've found helpful in understanding issues related to UDP buffering.
Although UDP is buffered on both the send and receive side, we've seldom seen a need to be concerned with send-side UDP buffering. For brevity, we'll use simply "UDP buffering" to refer to receive-side UDP buffering.
UDP packets may arrive in bursts because they were sent rapidly or because they were bunched together by the normal buffering action of network switches and routers.
Similarly, UDP packets may be consumed rapidly when CPU time is available to run the consuming application. Or they may be consumed slowly because CPU time is being used to run other processes.
UDP receive buffering serves to match the arrival rate of UDP packets (or "datagrams") with their consumption rate by an application program. Of course, buffering cannot help cases where the long-term average send rate exceeds the average receive rate.
UDP receive buffering is done in the operating system kernel. Typically the kernel allocates a fixed-size buffer for each socket receiving UDP. Buffer space is consumed for every UDP packet that has arrived but has not yet been delivered to the consuming application. Unused space is generally unavailable for other purposes because it must be readily available for the possible arrival of more packets. See Section 7.4 for a further explanation.
If a UDP packet arrives for a socket with a full buffer, it is discarded by the kernel and a counter is incremented. See Section 8.9 for information on detecting UDP loss. A common myth is that all UDP loss is bad. (See Myth: All Packet Loss is Bad.) Even if it's not all bad, UDP loss does have its consequences. (See Section 8.3.)
The memory required for the kernel to do UDP buffering is a scarce resource that the kernel tries to allocate wisely. (See Section 7.4 for more on the rationale.) Applications expecting low-volume UDP traffic or those expecting low CPU scheduling latency (see Section 17.6) need not consume very much of this scarce resource. Applications expecting high-volume UDP traffic or those expecting a high CPU scheduling latency may be justified in consuming more UDP buffer space than others.
Typically, the kernel allocates a modest-size buffer when a UDP socket is created.
This is generally adequate for less-demanding applications. Applications requiring a
larger UDP receive buffer can request it with the system call setsockopt(...SO_RCVBUF...).
Explicit configuration of the application may be required before it will request a
larger UDP buffer. (For LBM, this is the context option transport_lbtrm_receiver_socket_buffer.)
The kernel configuration will allow such requests to succeed only up to a set size limit. This limit may often be increased by changing the kernel configuration. See Section 8.8 for information on setting kernel UDP buffer limits.
Hence two steps are often required to get adequate UDP buffer space:
Change the kernel configuration to increase the limit on the largest UDP buffer allocation that it will allow.
Change the application to request a larger UDP buffer.
It may seem to some that the default maximum UDP buffer size on many Unix kernels is a bit stingy. Understanding the rationale behind these limits may help.
UDP is typically used for low-volume query/response work (e.g. DNS, NTP, etc.). The kernel default limits assume that UDP will be used in this way.
UDP kernel buffer space is allocated from physical memory for the exclusive use of one process. The kernel tries to make sure that one process can't starve others for physical memory by allocating large UDP buffers that exhaust all the physical memory on a machine.
To some degree, the meager default limits are a legacy from the days when 4 MB was a lot of physical memory. In these days when several gigabytes of physical memory space is common, such small default limits seem particularly stingy.
For the lowest-possible latency, the operating system would run a process wishing to receive a UDP packet as soon as the packet arrives. In practice, the operating system may allow other processes to finish using their CPU time slice first. It may also seek to improve efficiency by accumulating several UDP packets before running the application. We will call the time that elapses between when a UDP packet arrives and when the consuming application gets to run on a CPU the CPU scheduling latency. See Section 17.6 for more information. UDP buffer space fills during CPU scheduling latency and empties when the consuming process runs on a CPU. CPU scheduling latency plays a key role optimal UDP buffer sizing. See Section 8.1 for more information.
The operating system kernel automatically allocates TCP receive buffer space based on policy settings, available memory, and other factors. TCP in the sending kernel continuously monitors available receive buffer space in the receiving kernel. When a TCP receive buffer fills up, the sending kernel prevents the sending application from using CPU time to generate any more data for the connection. This behavior is called "flow control." In a nutshell, it prevents the receiver buffer space from overflowing by adding latency at the sender. We say that the speed of a TCP sender is "receiver-paced" because the sending application is prevented from sending when data cannot be delivered to the receiving application.
The OS kernel also allocates UDP receive buffer space based on policy settings. However, UDP senders do not monitor available UDP receive buffer space in the receiving kernel. UDP receivers simply discard incoming packets once all available buffer space is exhausted. We say that the speed of a UDP sender is "sender-paced" because the sending application can send whenever it wants without regard to available buffer space in the receiving kernel.
The default TCP buffer settings are generally adequate and usually require adjustment only for unusual network parameters or performance goals.
The appropriate size for an application's UDP receive buffer is influenced by factors that cannot be known when operating system default policies are established. Further, there is nothing the operating system can do to automatically discover the appropriate size. An application may know that it would benefit from a UDP receive buffer larger or smaller than the default used by the operating system. It can request a non-default UDP buffer size from the operating system, but the request may not be granted due to a policy limit. See Section 7.4 for reasons behind such policy limits. Operating system policies may have to be adjusted to successfully run high-performance UDP applications like LBM. See Section 8.8 to change operating system policies.
Copyright 2004 - 2009 29West, Inc.