Simulator
In this exercise, we’ll use multi.py
to simulate a multi-processor CPU scheduler, and learn about some of its details. Read the related README for more information about the simulator and its options.
Welcome to multi.py, a rudimentary multi-CPU scheduling simulator. This simulator has a number of features to play with, so pay attention! Or don't, because you are lazy that way. But when that exam rolls around... To run the simulator, all you have to do is type: prompt> ./multi.py This will then run the simulator with some random jobs. Before we get into any such details, let's first examine the basics of such a simulation. In the default mode, there are one or more CPUs in the system (as specified with the -n flag). Thus, to run with 4 CPUs in your simulation, type: prompt> ./multi.py -n 4 Each CPU has a cache, which can hold important data from one or more running processes. The size of each CPU cache is set by the -M flag. Thus, to make each cache have a size of '100' on your 4-CPU system, run: prompt> ./multi.py -n 4 -M 100 To run a simulation, you need some jobs to schedule. There are two ways to do this. The first is to let the system create some jobs with random characteristics for you (this is the default, i.e., if you specify nothing, you get this); there are also some controls to control (somewhat) the nature of randomly-generated jobs, described further below. The second is to specify a list of jobs for the system to schedule precisely; this is also described in more detail below. Each job has two characteristics. The first is its 'run time' (how many time units it will run for). The second is its 'working set size' (how much cache space it needs to run efficiently). If you are generating jobs randomly, you can control the range of these values by using the -R (maximum run-time flag) and -W (maximum working-set-size flag); the random generator will then generate values that are not bigger than those. If you are specifying jobs by hand, you set each of these explicitly, using the -L flag. For example, if you want to run two jobs, each with run-time of 100, but with different working-set-sizes of 50 and 150, respectively, you would do something like this: prompt> ./multi.py -n 4 -M 100 -L 1:100:50,2:100:150 Note that you assigned each job a name, '1' for the first job, and '2' for the second. When jobs are auto-generated, names are assigned automatically (just using numbers). How effectively a job runs on a particular CPU depends on whether the cache of that CPU currently holds the working set of the given job. If it doesn't, the job runs slowly, which means that only 1 tick of its runtime is subtracted from its remaining time left per each tick of the clock. This is the mode where the cache is 'cold' for that job (i.e., it does not yet contain the job's working set). However, if the job has run on the CPU previously for 'long enough', that CPU cache is now 'warm', and the job will execute more quickly. How much more quickly, you ask? Well, this depends on the value of the -r flag, which is the 'warmup rate'. Here, it is something like 2x by default, but you can change it as needed. How long does it take for a cache to warm up, you ask? Well, that is also set by a flag, in this case, the -w flag, which sets the 'warmup time'. By default, it is something like 10 time units. Thus, if a job runs for 10 time units, the cache on that CPU becomes warm, and then the job starts running faster. All of this, of course, is a gross approximation of how a real system works, but that's the beauty of simulation, right? So now we have CPUs, each with caches, and a way to specify jobs. What's left? Aha, you say, the scheduling policy! And you are right. Way to go, diligent homework-doing person! The first (default) policy is simple: a centralized scheduling queue, with a round-robin assignment of jobs to idle CPUs. The second is a per-CPU scheduling queue approach (turned on with -p), in which jobs are assigned to one of N scheduling queues (one per CPU); in this approach, an idle CPU will (on occasion) peek into some other CPU's queue and steal a job, to improve load balancing. How often this is done is set by a 'peek' interval (set by the -P flag). With this basic understanding in place, you should now be able to do the homework, and study different approaches to scheduling. To see a full list of options for this simulator, just type: prompt> ./multi.py -h Usage: multi.py [options] Options: Options: -h, --help show this help message and exit -s SEED, --seed=SEED the random seed -j JOB_NUM, --job_num=JOB_NUM number of jobs in the system -R MAX_RUN, --max_run=MAX_RUN max run time of random-gen jobs -W MAX_WSET, --max_wset=MAX_WSET max working set of random-gen jobs -L JOB_LIST, --job_list=JOB_LIST provide a comma-separated list of job_name:run_time:working_set_size (e.g., a:10:100,b:10:50 means 2 jobs with run-times of 10, the first (a) with working set size=100, second (b) with working set size=50) -p, --per_cpu_queues per-CPU scheduling queues (not one) -A AFFINITY, --affinity=AFFINITY a list of jobs and which CPUs they can run on (e.g., a:0.1.2,b:0.1 allows job a to run on CPUs 0,1,2 but b only on CPUs 0 and 1 -n NUM_CPUS, --num_cpus=NUM_CPUS number of CPUs -q TIME_SLICE, --quantum=TIME_SLICE length of time slice -P PEEK_INTERVAL, --peek_interval=PEEK_INTERVAL for per-cpu scheduling, how often to peek at other schedule queue; 0 turns this off -w WARMUP_TIME, --warmup_time=WARMUP_TIME time it takes to warm cache -r WARM_RATE, --warm_rate=WARM_RATE how much faster to run with warm cache -M CACHE_SIZE, --cache_size=CACHE_SIZE cache size -o, --rand_order has CPUs get jobs in random order -t, --trace enable basic tracing (show which jobs got scheduled) -T, --trace_time_left trace time left for each job -C, --trace_cache trace cache status (warm/cold) too -S, --trace_sched trace scheduler state -c, --compute compute answers for me Probably the best to learn about are some of the tracing options (like -t, -T, -C, and -S). Play around with these to see better what the scheduler and system are doing.
Questions
-
To start things off, let’s learn how to use the simulator to study how to build an effective multi-processor scheduler. The first simulation will run just one job, which has a run-time of 30, and a working-set size of 200. Run this job (called job ’a’ here) on one simulated CPU as follows:
./multi.py -n 1 -L a:30:200
. How long will it take to complete? Turn on the-c
flag to see a final answer, and the-t
flag to see a tick-by-tick trace of the job and how it is scheduled. -
Now increase the cache size so as to make the job’s working set (size=200) fit into the cache (which, by default, is size=100); for example, run
./multi.py -n 1 -L a:30:200 -M 300
. Can you predict how fast the job will run once it fits in cache? (hint: remember the key parameter of the warm rate, which is set by the-r
flag) Check your answer by running with the solve flag (-c
) enabled. -
One cool thing about
multi.py
is that you can see more detail about what is going on with different tracing flags. Run the same simulation as above, but this time with time left tracing enabled (-T
). This flag shows both the job that was scheduled on a CPU at each time step, as well as how much run-time that job has left after each tick has run. What do you notice about how that second column decreases? -
Now add one more bit of tracing, to show the status of each CPU cache for each job, with the
-C
flag. For each job, each cache will either show a blank space (if the cache is cold for that job) or a ’w’ (if the cache is warm for that job). At what point does the cache become warm for job ’a’ in this simple example? What happens as you change the warmup time parameter (-w
) to lower or higher values than the default? -
At this point, you should have a good idea of how the simulator works for a single job running on a single CPU. But hey, isn’t this a multiprocessor CPU scheduling chapter? Oh yeah! So let’s start working with multiple jobs. Specifically, let’s run the follow- ing three jobs on a two-CPU system (i.e.,
type ./multi.py -n 2 -L a:100:100,b:100:50,c:100:50
) Can you predict how long this will take, given a round-robin centralized scheduler? Use-c
to see if you were right, and then dive down into details with-t
to see a step-by-step and then-C
to see whether caches got warmed effectively for these jobs. What do you notice? -
Now we’ll apply some explicit controls to study cache affinity, as described in the chapter. To do this, you’ll need the
-A
flag. This flag can be used to limit which CPUs the scheduler can place a particular job upon. In this case, let’s use it to place jobs ’b’ and ’c’ on CPU 1, while restricting ’a’ to CPU 0. This magic is accomplished by typing this./multi.py -n 2 -L a:100:100,b:100:50, c:100:50 -A a:0,b:1,c:1
; don’t forget to turn on various tracing options to see what is really happening! Can you predict how fast this version will run? Why does it do better? Will other combinations of ’a’, ’b’, and ’c’ onto the two processors run faster or slower? -
One interesting aspect of caching multiprocessors is the opportunity for better-than-expected speed up of jobs when using multiple CPUs (and their caches) as compared to running jobs on a single processor. Specifically, when you run on N CPUs, sometimes you can speed up by more than a factor of N, a situation entitled super-linear speedup. To experiment with this, use the job description here (
-L a:100:100,b:100:100,c:100:100
) with a small cache (-M 50
) to create three jobs. Run this on systems with 1, 2, and 3 CPUs (-n 1,-n 2,-n 3
). Now, do the same, but with a larger per-CPU cache of size 100. What do you notice about performance as the number of CPUs scales? Use-c
to confirm your guesses, and other tracing flags to dive even deeper. -
One other aspect of the simulator worth studying is the per-CPU scheduling option, the
-p
flag. Run with two CPUs again, and this three job configuration (-L a:100:100,b:100:50,c:100:50
). How does this option do, as opposed to the hand-controlled affinity limits you put in place above? How does performance change as you alter the ’peek interval’ (-P
) to lower or higher values? How does this per-CPU approach work as the number of CPUs scales? -
Finally, feel free to just generate random workloads and see if you can predict their performance on different numbers of processors, cache sizes, and scheduling options. If you do this, you’ll soon be a multiprocessor scheduling master, which is a pretty awesome thing to be. Good luck!
Get hands-on with 1400+ tech skills courses.