Another Problem: State Management

This lesson discusses the challenge of state management that arises because of the event-based approach for concurrency.

We'll cover the following

Another issue with the event-based approach is that such code is generally more complicated to write than traditional thread-based code. The reason is that when an event handler issues an asynchronous I/O, it must package up some program state for the next event handler to use when the I/O finally completes. This additional work is not needed in thread-based programs, as the state the program needs is on the stack of the thread. Adya et al.“Cooperative Task Management Without Manual Stack Management” by Atul Adya, Jon Howell, Marvin Theimer, William J. Bolosky, John R. Douceur. USENIX ATC ’02, Monterey, CA, June 2002. This gem of a paper is the first to clearly articulate some of the difficulties of event-based concurrency, and suggests some simple solutions, as well as explores the even crazier idea of combining the two types of concurrency management into a single application! call this work manual stack management, and it is fundamental to event-based programming.

An example

To make this point more concrete, let’s look at a simple example in which a thread-based server needs to read from a file descriptor (fd) and, once complete, write the data that it read from the file to a network socket descriptor (sd). The code (ignoring error checking) looks like this:

Press + to interact
int rc = read(fd, buffer, size);
rc = write(sd, buffer, size);

As you can see, in a multi-threaded program, doing this kind of work is trivial. When the read() finally returns, the code immediately knows which socket to write to because that information is on the stack of the thread (in the variable sd).

In an event-based system, life is not so easy. To perform the same task, we’d first issue the read asynchronously, using the AIO calls described above. Let’s say we then periodically check for completion of the read using the aio_error() call. When that call informs us that the read is complete, how does the event-based server know what to do?

The solution, as described by Adya et al.“Cooperative Task Management Without Manual Stack Management” by Atul Adya, Jon Howell, Marvin Theimer, William J. Bolosky, John R. Douceur. USENIX ATC ’02, Monterey, CA, June 2002. This gem of a paper is the first to clearly articulate some of the difficulties of event-based concurrency, and suggests some simple solutions, as well as explores the even crazier idea of combining the two types of concurrency management into a single application!, is to use an old programming language construct known as a continuation“Programming With Continuations” by Daniel P. Friedman, Christopher T. Haynes, Eugene E. Kohlbecker. In Program Transformation and Programming Environments, Springer Verlag, 1984. The classic reference to this old idea from the world of programming languages. Now increasingly popular in some modern languages.. Though it sounds complicated, the idea is rather simple: basically, record the needed information to finish processing this event in some data structure; when the event happens (i.e., when the disk I/O completes), look up the needed information and process the event. In this specific case, the solution would be to record the socket descriptor (sd) in some kind of data structure (e.g., a hash table), indexed by the file descriptor (fd). When the disk I/O completes, the event handler would use the file descriptor to look up the continuation, which will return the value of the socket descriptor to the caller. At this point (finally), the server can then do the last bit of work to write the data to the socket.

ASIDE: UNIX SIGNALS

A huge and fascinating infrastructure known as signals is present in all modern UNIX variants. At its simplest, signals provide a way to communicate with a process. Specifically, a signal can be delivered to an application; doing so stops the application from whatever it is doing to run a signal handler, i.e., some code in the application to handle that signal. When finished, the process just resumes its previous behavior.

Each signal has a name, such as HUP (hang up), INT (interrupt), SEGV (segmentation violation), etc.; see the man page for details. Interestingly, sometimes it is the kernel itself that does the signaling. For example, when your program encounters a segmentation violation, the OS sends it a SIGSEGV (prepending SIG to signal names is common); if your program is configured to catch that signal, you can actually run some code in response to this erroneous program behavior (which is helpful for debugging). When a signal is sent to a process not configured to handle a signal, the default behavior is enacted; for SEGV, the process is killed.

Here is a simple program that goes into an infinite loop, but has first set up a signal handler to catch SIGHUP:

void handle(int arg) {
   printf("stop wakin’ me up...\n");
}
int main(int argc, char *argv[]) {
   signal(SIGHUP, handle);
   while (1)
       ; // doin’ nothin’ except catchin’ some sigs
   return 0;
}

You can send signals to it with the kill command line tool (yes, this is an odd and aggressive name). Doing so will interrupt the main while loop in the program and run the handler code handle():

prompt> ./main &
[3] 36705
prompt> kill -HUP 36705
stop wakin’ me up...
prompt> kill -HUP 36705
stop wakin’ me up...

There is a lot more to learn about signals, so much that a single chapter, much less a single page, does not nearly suffice. As always, there is one great source: Stevens and Rago“Advanced Programming in the UNIX Environment” by W. Richard Stevens and Stephen A. Rago. Addison-Wesley, 2005. Once again, we refer to the classic must-have-on-your-bookshelf book of UNIX systems programming. If there is some detail you need to know, it is in here.. You can read more if you’re interested.

Get hands-on with 1400+ tech skills courses.