Most messaging systems make use of an independent process or thread that is "owned" by the messaging system. It is used to do housekeeping (like timer management), is usually used to receive each incoming network packet, and often is used to send each outgoing network packet. The actual application typically runs in a separate set of cooperating threads and/or processes. The switch between the messaging thread/process and the application thread/process is another source of message latency.
To minimize latency, it can be useful to design the messaging system to call an application function directly from the messaging layer when a message is received, without requiring a thread/process switch. If the application function is able to fully process the message quickly, it can do so and immediately return to the messaging layer, resulting in the message being received and consumed without any thread switching at all. However, in systems with that design, care must be taken to prevent the application function delaying its return for too long. For example, if the application function needs to block, then the entire messaging system will be brought to a halt, introducing latency across all topics and transport sessions. If lengthy processing is required, it is usually better for that processing to be done in an application thread.
The 29West LBM product typically requires no thread switching at all to send messages; the application thread calls the send function, which performs the LBM processing and eventually calls the socket write directly. LBM also allows application callback for received messages to be processed from the LBM thread. For applications that require non-trivial processing of received messages, LBM also provides thread-safe event queues so that the application callback on received messages is done from an application thread. Note that the programming paradigm is the same - in both cases an application callback is the method for message delivery. This makes it easy to try both methods and choose the one that performs more appropriately for your specific application.
Also note that LBM's event queues are fully thread-safe. One method for reducing average message latency (especially during traffic bursts) is to have multiple application "worker" threads dispatching a single event queue. For multi-CPU machines, this can allow received messages to be processed with true parallelism, reducing the time required to process a group of messages, thus reducing the average latency. Especially when a non-trivial amount of processing needs to be done on received messages, this average latency reduction can far outweigh the cost of the thread switch from LBM to application. However, as with arrival-order delivery discussed above, parallel processing of incoming messages can introduce complexity in the design of the application, especially as it relates to message processing order. Messages will be dispatched in the order they are received, but the operating system may schedule threads to run on available processors in an unexpected order, with the result that messages are finished being processed in a different order. For applications that can tolerate this kind of ordering, the benefits of reduced average latency and increased throughput can be considerable (on multi-CPU hardware).
Copyright 2005 - 2006 29West, Inc. -- 29West Confidential