The fork() System Call
In this lesson, you will have a look at the `fork()` system call in UNIX systems.
We'll cover the following
Running p1.c
fork()
system call is used to create a new process
#include <stdio.h>#include <stdlib.h>#include <unistd.h>int main(int argc, char* argv[]){printf("hello world (pid:%d)\n", (int)getpid());fflush(stdout);int rc = fork();if (rc < 0) {// fork failedfprintf(stderr, "fork failed\n");exit(1);}else if (rc == 0){// child (new process)printf("hello, I am child (pid:%d)\n", (int)getpid());}else{// parent goes down this path (main)printf("hello, I am parent of %d (pid:%d)\n", rc, (int)getpid());}return 0;}
When you run this program (called p1.c
), you’ll see an output similar to the following:
prompt> ./p1hello world (pid:29146)hello, I am parent of 29147 (pid:29146)hello, I am child (pid:29147)prompt>
Let us understand what happened in more detail in p1.c
. When it first starts running, the process prints out a hello world message; included in that message is its process identifier, also known as a PID. For example, for the output shown above, the process has a PID of 29146; in UNIX systems, the PID is used to name the process if one wants to do something with the process, such as (for example) stop it from running. So far, so good.
Now the interesting part begins. The process calls the fork()
system call, which the OS provides as a way to create a new process. The odd part: the process that is created is an (almost) exact copy of the calling process. That means that to the OS, it now looks like there are two copies of the program p1
running, and both are about to return from the fork()
system call. The newly-created process (called the child, in contrast to the creating parent) doesn’t start running at main()
like you might expect (note, the “hello, world” message only got printed out once); rather, it just comes into life as if it had called fork()
itself.
You might have noticed: the child isn’t an exact copy. Specifically, although it now has its own copy of the address space (i.e., its own private memory), its own registers, its own PC, and so forth, the value it returns to the caller of fork() is different. Specifically, while the parent receives the PID of the newly-created child, the child receives a return code of zero. This differentiation is useful because it is simple then to write the code that handles the two different cases (as above).
Non-deterministic output
You might also have noticed: the output (of p1.c
) is not deterministic. When the child process is created, there are now two active processes in the system that we care about: the parent and the child. Assuming we are running on a system with a single CPU (for simplicity), then either the child or the parent might run at that point. In our example (the output above), the parent did and thus printed out its message first. In other cases, the opposite might happen, as we show in this output trace:
prompt> ./p1hello world (pid:29146)hello, I am child (pid:29147)hello, I am parent of 29147 (pid:29146)prompt>
The CPU scheduler, a topic we’ll discuss in great detail soon, determines which process runs at a given moment in time; because the scheduler is complex, we cannot usually make strong assumptions about what it will choose to do, and hence which process will run first. This non-determinism, as it turns out, leads to some interesting problems, particularly in multi-threaded programs; hence, we’ll see a lot more non-determinism when we study concurrency in the second part of the course.
Get hands-on with 1400+ tech skills courses.