Segmentation: Generalized Base/Bounds

Get introduced to the concept of segmentation which improves the utilization of the memory space.

We'll cover the following

To solve this problem, an idea was born, and it is called segmentation. It is quite an old idea, going at least as far back as the very early 1960’s1.“Program Organization and Record Keeping for Dynamic Storage” by A. W. Holt. Communications of the ACM, Volume 4:10, October 1961. An incredibly early and difficult to read paper about segmentation and some of its uses. 2. “Fact Segmentation” by M. N. Greenfield. Proceedings of the SJCC, Volume 21, May 1962. Another early paper on segmentation; so early that it has no references to other work.. The idea is simple: instead of having just one base and bounds pair in our MMU, why not have a base and bounds pair per logical segment of the address space? A segment is just a contiguous portion of the address space of a particular length, and in our canonical address space, we have three logically-different segments:

  1. Code
  2. Stack
  3. Heap

What segmentation allows the OS to do is to place each one of those segments in different parts of physical memory, and thus avoid filling physical memory with unused virtual address space.

Example

Let’s look at an example. Assume we want to place the address space from the figure here into physical memory. With a base and bounds pair per segment, we can place each segment independently in physical memory. For example, see the figure below; there you see a 64KB physical memory with those three segments in it (and 16KB reserved for the OS).

As you can see in the diagram, only used memory is allocated space in physical memory, and thus large address spaces with large amounts of unused address space (which we sometimes call sparse address spaces) can be accommodated.

The hardware structure in our MMU required to support segmentation is just what you’d expect: in this case, a set of three base and bounds register pairs. The figure below shows the register values for the example above; each bounds register holds the size of a segment.

You can see from the figure that the code segment is placed at physical address 32KB and has a size of 2KB and the heap segment is placed at 34KB and has a size of 3KB. The size segment here is exactly the same as the bounds register introduced previously; it tells the hardware exactly how many bytes are valid in this segment (and thus, enables the hardware to determine when a program has made an illegal access outside of those bounds).

Let’s do an example translation, using the address space in the figure in the previous lesson. Assume a reference is made to virtual address 100 (which is in the code segment, as you can see visually in the figure from the previous lesson). When the reference takes place (say, on an instruction fetch), the hardware will add the base value to the offset into this segment (100 in this case) to arrive at the desired physical address: 100 + 32KB, or 32868. It will then check that the address is within bounds (100 is less than 2KB), find that it is, and issue the reference to physical memory address 32868.

Now let’s look at an address in the heap, virtual address 4200 (again refer to the figure here). If we just add the virtual address 4200 to the base of the heap (34KB), we get a physical address of 39016, which is not the correct physical address. What we need to first do is extract the offset into the heap, i.e., which byte(s) in this segment the address refers to. Because the heap starts at virtual address 4KB (4096), the offset of 4200 is actually 4200 minus 4096, or 104. We then take this offset (104) and add it to the base register physical address (34K) to get the desired result: 34920.

Segmentation fault

What if we tried to refer to an illegal address (i.e., a virtual address of 7KB or greater), which is beyond the end of the heap? You can imagine what will happen: the hardware detects that the address is out of bounds, traps into the OS, likely leading to the termination of the offending process. And now you know the origin of the famous term that all C programmers learn to dread: the segmentation violation or segmentation fault.

ASIDE: THE SEGMENTATION FAULT

The term segmentation fault or violation arises from a memory access on a segmented machine to an illegal address. Humorously, the term persists, even on machines with no support for segmentation at all. Or not so humorously, if you can’t figure out why your code keeps faulting.

Get hands-on with 1400+ tech skills courses.