Why? Motivating the API

This lesson focuses on what you can achieve through the combination of the system calls you just learned.

We'll cover the following

Shell

Of course, one big question you might have: why would we build such an odd interface to what should be the simple act of creating a new process? Well, as it turns out, the separation of fork() and exec() is essential in building a UNIX shell, because it lets the shell run code after the call to fork() but before the call to exec(); this code can alter the environment of the about-to-be-run program, and thus enables a variety of interesting features to be readily built.

The shell is just a user program.

It shows you a prompt and then waits for you to type something into it. You then type a command (i.e., the name of an executable program, plus any arguments) into it; in most cases, the shell then figures out where in the file system the executable resides, calls fork() to create a new child process to run the command, calls some variant of exec() to run the command, and then waits for the command to complete by calling wait(). When the child completes, the shell returns from wait() and prints out a prompt again, ready for your next command.

The separation of fork() and exec() allows the shell to do a whole bunch of useful things rather easily. For example:

Press + to interact
prompt> wc p3.c > newfile.txt

In the example above, the output of the program wc is redirected into the output file newfile.txt (the greater-than sign is how said redirection is indicated). The way the shell accomplishes this task is quite simple: when the child is created, before calling exec(), the shell closes standard output and opens the file newfile.txt. By doing so, any output from the soon-to-be-running program wc is sent to the file instead of the screen.

Running p4.c

The code below shows a program that does exactly this. The reason this redirection works is due to an assumption about how the operating system manages file descriptors. Specifically, UNIX systems start looking for free file descriptors at zero. In this case, STDOUT_FILENO will be the first available one and thus get assigned when open() is called. Subsequent writes by the child process to the standard output file descriptor, for example by routines such as printf(), will then be routed transparently to the newly-opened file instead of the screen.

Press + to interact
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <assert.h>
#include <sys/wait.h>
int main(int argc, char *argv[])
{
int rc = fork();
if (rc < 0) {
// fork failed; exit
fprintf(stderr, "fork failed\n");
exit(1);
} else if (rc == 0) {
// child: redirect standard output to a file
close(STDOUT_FILENO);
open("./output/p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
// now exec "wc"...
char *myargs[3];
myargs[0] = strdup("wc"); // program: "wc" (word count)
myargs[1] = strdup("p4.c"); // argument: file to count
myargs[2] = NULL; // marks end of array
execvp(myargs[0], myargs); // runs word count
} else {
// parent goes down this path (original process)
int wc = wait(NULL);
assert(wc >= 0);
}
return 0;
}

If you download the generated file p4.output after running the p4.c program above and view its contents, here is what​ you might see:

Press + to interact
prompt> ./p4
prompt> cat p4.output
32 109 846 p4.c
prompt>

You’ll notice (at least) two interesting tidbits about this output. First, when p4 is run, it looks as if nothing has happened; the shell just prints the command prompt and is immediately ready for your next command. However, that is not the case; the program p4 did indeed call fork() to create a new child, and then run the wc program via a call to execvp(). You don’t see any output printed to the screen because it has been redirected to the file p4.output (present in the output directory). Second, you can see that when we view the output file, all the expected output from running wc is found. Cool, right?

Pipeline

UNIX pipes are implemented in a similar way but with the pipe() system call. In this case, the output of one process is connected to an in-kernel pipe (i.e., queue), and the input of another process is connected to that same pipe; thus, the output of one process seamlessly is used as input to the next, and long and useful chains of commands can be strung together. As a simple example, consider looking for a word in a file, and then counting how many times said word occurs; with pipes and the utilities grep and wc, it is easy; just type grep -o foo file | wc -l into the command prompt and marvel at the result.

Let’s try it out in the terminal below. The first example has been done for you.

Terminal 1
Terminal
Loading...

Finally, while we just have sketched out the process API at a high level, there is a lot more detail about these calls out there to be learned and digested; we’ll learn more, for example, about file descriptors when we talk about file systems in the third part of the course. For now, suffice it to say that the fork()/exec() combination is a powerful way to create and manipulate processes.

Get hands-on with 1400+ tech skills courses.