Who Handles The TLB Miss?

This lesson discusses the two ways which are used to handle a TLB miss.

One question that we must answer: who handles a TLB miss? Two answers are possible: the hardware, or the software (OS).

Hardware handling TLB miss

In the olden days, the hardware had complex instruction sets (sometimes called CISC, for complex-instruction set computers) and the people who built the hardware didn’t much trust those sneaky OS people. Thus, the hardware would handle the TLB miss entirely. To do this, the hardware has to know exactly where the page tables are located in memory (via a page-table base register, used in Line 11 in the code snippet captioned TLB Control Flow Algorithm), as well as their exact format; on a miss, the hardware would “walk” the page table, find the correct page-table entry and extract the desired translation, update the TLB with the translation, and retry the instruction. An example of an “older” architecture that has hardware-managed TLBs is the Intel x86 architecture, which uses a fixed multi-level page table (see the next chapter for details); the current page table is pointed to by the CR3 register“Intel 64 and IA-32 Architectures Software Developer’s Manuals” by Intel, 2009. Available: http://www.intel.com/products/processor/manuals. In particular, pay attention to “Volume 3A: System Programming Guide” Part 1 and “Volume 3B: System Programming Guide Part 2”..

Software-managed TLB

More modern architectures (e.g., MIPS R10k“MIPS R4000 Microprocessor User’s Manual”. by Joe Heinrich. Prentice-Hall, June 1993. Available: http://cag.csail.mit.edu/raw/ . documents/R4400 Uman book Ed2.pdf A manual, one that is surprisingly readable. Or is it? or Sun’s SPARC v9“The SPARC Architecture Manual: Version 9” by David L. Weaver and Tom Germond. SPARC International, San Jose, California, September 2000. Available: www.sparc.org/ standards/SPARCV9.pdf. Another manual. I bet you were hoping for a more fun citation to end this chapter., both RISC or reduced-instruction set computers) have what is known as a software-managed TLB. On a TLB miss, the hardware simply raises an exception (line 11 in the snippet below), which pauses the current instruction stream, raises the privilege level to kernel mode, and jumps to a trap handler. As you might guess, this trap handler is code within the OS that is written with the express purpose of handling TLB misses. When run, the code will look up the translation in the page table, use special “privileged” instructions to update the TLB, and return from the trap; at this point, the hardware retries the instruction (resulting in a TLB hit).

Press + to interact
VPN = (VirtualAddress & VPN_MASK) >> SHIFT
(Success, TlbEntry) = TLB_Lookup(VPN)
if (Success == True) // TLB Hit
if (CanAccess(TlbEntry.ProtectBits) == True)
Offset = VirtualAddress & OFFSET_MASK
PhysAddr = (TlbEntry.PFN << SHIFT) | Offset
Register = AccessMemory(PhysAddr)
else
RaiseException(PROTECTION_FAULT)
else // TLB Miss
RaiseException(TLB_MISS)

Let’s discuss a couple of important details. First, the return-from-trap instruction needs to be a little different than the return-from-trap we saw before when servicing a system call. In the latter case, the return-from-trap should resume execution at the instruction after the trap into the OS, just as a return from a procedure call returns to the instruction immediately following the call into the procedure. In the former case, when returning from a TLB miss-handling trap, the hardware must resume execution at the instruction that caused the trap. This retry thus lets the instruction run again, this time resulting in a TLB hit. Thus, depending on how a trap or exception was caused, the hardware must save a different PC when trapping into the OS, in order to resume properly when the time to do so arrives.

ASIDE: RISC VS. CISC

In the 1980s, a great battle took place in the computer architecture community. On one side was the CISC camp, which stood for Complex Instruction Set Computing; on the other side was RISC, for Reduced Instruction Set Computing“RISC-I: A Reduced Instruction Set VLSI Computer” by D.A. Patterson and C.H. Sequin. ISCA ’81, Minneapolis, May 1981. The paper that introduced the term RISC, and started the avalanche of research into simplifying computer chips for performance.. The RISC side was spearheaded by David Patterson at Berkeley and John Hennessy at Stanford (who are also co-authors of some famous books“Computer Architecture: A Quantitative Approach” by John Hennessy and David Patterson. Morgan-Kaufmann, 2006. A great book about computer architecture. We have a particular attachment to the classic first edition.), although later John Cocke was recognized with a Turing award for his earliest work on RISC“The evolution of RISC technology at IBM” by John Cocke, V. Markstein. IBM Journal of Research and Development, 44:1/2. A summary of the ideas and work behind the IBM 801, which many consider the first true RISC microprocessor..

CISC instruction sets tend to have a lot of instructions in them, and each instruction is relatively powerful. For example, you might see a string copy, which takes two pointers and a length and copies bytes from source to destination. The idea behind CISC was that instructions should be high-level primitives, to make the assembly language itself easier to use, and to make code more compact.

RISC instruction sets are exactly the opposite. A key observation behind RISC is that instruction sets are really compiler targets, and all compilers really want are a few simple primitives that they can use to generate high-performance code. Thus, RISC proponents argued, let’s rip out as much from the hardware as possible (especially the microcode), and make what’s left simple, uniform, and fast.

In the early days, RISC chips made a huge impact, as they were noticeably faster“Performance from Architecture: Comparing a RISC and a CISC with Similar Hardware Organization” by D. Bhandarkar and Douglas W. Clark. Communications of the ACM, September 1991. A great and fair comparison between RISC and CISC. The bottom line: on similar hardware, RISC was about a factor of three better in performance.; many papers were written; a few companies were formed (e.g., MIPS and Sun). However, as time progressed, CISC manufacturers such as Intel incorporated many RISC techniques into the core of their processors, for example by adding early pipeline stages that transformed complex instructions into micro-instructions which could then be processed in a RISC-like manner. These innovations, plus a growing number of transistors on each chip, allowed CISC to remain competitive. The end result is that the debate died down, and today both types of processors can be made to run fast.

Second, when running the TLB miss-handling code, the OS needs to be extra careful not to cause an infinite chain of TLB misses to occur. Many solutions exist; for example, you could keep TLB miss handlers in physical memory (where they are unmapped and not subject to address translation), or reserve some entries in the TLB for permanently-valid translations and use some of those permanent translation slots for the handler code itself; these wired translations always hit in the TLB.

The primary advantage of the software-managed approach is flexibility: the OS can use any data structure it wants to implement the page table, without necessitating hardware change. Another advantage is simplicity, as seen in the TLB control flow (line 11 in the code snippet provided above, in contrast to lines 11–19 in the code snippet here). The hardware doesn’t do much on a miss: just raise an exception and let the OS TLB miss handler do the rest.

ASIDE: TLB VALID BIT \neq PAGE TABLE VALID BIT

A common mistake is to confuse the valid bits found in a TLB with those found in a page table. In a page table, when a page-table entry (PTE) is marked invalid, it means that the page has not been allocated by the process, and should not be accessed by a correctly-working program. The usual response when an invalid page is accessed is to trap to the OS, which will respond by killing the process.

A TLB valid bit, in contrast, simply refers to whether a TLB entry has a valid translation within it. When a system boots, for example, a common initial state for each TLB entry is to be set to invalid, because no address translations are yet cached there. Once virtual memory is enabled, and once programs start running and accessing their virtual address spaces, the TLB is slowly populated, and thus valid entries soon fill the TLB.

The TLB valid bit is quite useful when performing a context switch too, as we’ll discuss further below. By setting all TLB entries to invalid, the system can ensure that the about-to-be-run process does not accidentally use a virtual-to-physical translation from a previous process.

Get hands-on with 1400+ tech skills courses.