Background: Multiprocessor Architecture
This lesson discusses hardware caches and caching based on locality. Additionally, you will also learn about the problem of cache coherence and how it is solved by the hardware.
We'll cover the following
Hardware caches
To understand the new issues surrounding multiprocessor scheduling, we have to understand a new and fundamental difference between single-CPU hardware and multi-CPU hardware. This difference centers around the use of hardware caches (see figure below), and exactly how data is shared across multiple processors. We now discuss this issue further, at a high level. Details are available
In a system with a single CPU, there is a hierarchy of hardware caches that in general help the processor run programs faster. Caches are small, fast memories that (in general) hold copies of popular data that is found in the main memory of the system. Main memory, in contrast, holds all of the data, but access to this larger memory is slower. By keeping frequently accessed data in a cache, the system can make the large, slow memory appear to be a fast one.
As an example, consider a program that issues an explicit load instruction to fetch a value from memory, and a simple system with only a single CPU; the CPU has a small cache (say 64 KB) and large main memory. The first time a program issues this load, the data resides in main memory, and thus takes a long time to fetch (perhaps in the tens of nanoseconds, or even hundreds). The processor, anticipating that the data may be reused, puts a copy of the loaded data into the CPU cache. If the program later fetches this same data item again, the CPU first checks for it in the cache; if it finds it there, the data is fetched much more quickly (say, just a few nanoseconds), and thus the program runs faster.
Temporal and spatial locality
Caches are thus based on the notion of locality, of which there are two kinds: temporal locality and spatial locality. The idea behind temporal locality is that when a piece of data is accessed, it is likely to be accessed again in the near future; imagine variables or even instructions themselves being accessed over and over again in a loop. The idea behind spatial locality is that if a program accesses a data item at address x, it is likely to access data items near x as well. Here, think of a program streaming through an array, or instructions being executed one after the other. Because locality of these types exists in many programs, hardware systems can make good guesses about which data to put in a cache and thus work well.
Now for the tricky part: what happens when you have multiple processors in a single system, with a single shared main memory, as we see in the figure below?
As it turns out, caching with multiple CPUs is much more complicated. Imagine, for example, that a program running on CPU 1 reads a data item (with value D) at address A; because the data is not in the cache on CPU 1, the system fetches it from main memory, and gets the value D. The program then modifies the value at address A, just updating its cache with the new value D′; writing the data through all the way to main memory is slow, so the system will (usually) do that later. Then assume the OS decides to stop running the program and move it to CPU 2. The program then re-reads the value at address A; there is no such data CPU 2’s cache, and thus the system fetches the value from main memory, and gets the old value D instead of the correct value D′. Oops!
Cache coherence
This general problem is called the problem of cache coherence, and
The basic solution is provided by the hardware: by monitoring memory accesses, hardware can ensure that basically the “right thing” happens and that the view of a single shared memory is preserved. One way to do this on a bus-based system (as described above) is to use an old technique known as
Get hands-on with 1400+ tech skills courses.