L9

In Lab1 and Lab2, all processes (user programs as well as kernel) run in the physical address space. This is bad because there's no isolation among processes; one bad program can corrupt the memory of other processes.

This class discusses how to virtualize memories to provide true isolation.

Digression: addresses in programs

When the compiler and assembler have finished processing a module, they produce an object module that is almost runnable.
A linker is required to stitch together multiple object modules by relocating relative addresses and resolving external references.
Compiler/assembler treats each module as if it will be loaded at location zero.
- e.g. jmp $0x100 means jumping to address 0x100 of the current module
Linker converts the base address of the module to the relative address. The base address is the address at which this module will be loaded.
- e.g. If module A is to be loaded starting at location 2300 and contains the instruction jump $0x120, the linker changes this instruction to jmp $0x2420
Linker processes the modules one at a time. The first module is to be loaded at location zero.After processing the first module, the link knows its length and hence the base address of the next module to be loaded.
An object module usually contains a reference to a function/variable, e.g. f(), defined in some other module.
When the program is compiled, the compiler and assembler do not know the location of f() so there is no way they can supply the address. So instead, a dummy address is supplied to be filled with the location of f() later.
The linker obtains the absolute address of f() and changes all uses of f() to the actual address.

Address spaces

The address space abstraction:
- Each process occupy a private memory area for code/data/stack
- A process cannot read/write outside its address space
- Allow controlled sharing among processes when needed.
Why address space abstraction?
- Simplify software: addresses in one process not affected by other processes
- Fault isolation: contain bugs and improve security
- Efficient sharing among processes
- Allow tricks like demand paging, copy-on-write (next lecture)
The address space implementation is done jointly by OS and h/w (memory management unit, MMU).
h/w's job:
- Translate virtual addresses to physical addresses
- Detect and prevent attempts to use memory outside the address space.
- Allow cross-space transfers (syscalls, interrupts)
OS's job:
- Create/manage virtual address space for processes
  - allocate physical memory (for creation, growth, deletion)
- Switch between address spaces (to switch processes)
- Set up the h/w
OS implementation considerations:
- OS has its own address space
- OS should be able to conveniently read/write user program's memory
- user processes must not read/modify OS's memory
Two main h/w approaches, segmentation and paging.
- Paging has won over segmentation: most OS only use paging
- x86 provides many features via segmentation (protection, interrupts), so we'll learn about segmentation to learn of some basic OS implementation techniques in x86.

X86 segments

PC block diagram w/o virtual memory support
x86 segmentation+paging mechanism. Logical address (virtual address) ---> linear address ---> physical address
x86 starts out in real mode. Its address translation works as follows:
- programs use 16-bit virtual address
- seg_reg*16+va = physical address
- physical addresses are 20-bits, so max 1MB RAM addressable by programs
- No protection: any program can load anything into seg registers
OS switches x86 from real to protected mode which supports 32-bit virtual/physical addresses and allows memory protection
Let's look at segmentation first
Protected-mode segmentation works as follows:
- segment register (%cs,%ss,%ds,%es,%fs,%gs) holds segment selector
- selector: 13 bits of index, 1 bit local vs global flag, 2-bit RPL (request privilege level)
- selector indexes into global descriptor table (GDT) which is an array of segment descriptors
- Each 64-bit segment descriptor holds 32-bit base, limit, protection, DPL (destination privilege leve)
- The translation h/w obtains linear address (la) from virtual address (va) as follows: la = va + base; assert(va < limit)
- which segment register is being used is often implicit in the instruction
  - Those that modify %esp (e.g. pop/push) use %ss, those that modify %eip use %cs, others (mostly) use %ds
  - Some instructions take explicit "far" addresses:
    - ljmp $selector, $offset
- GDT lives in memory, CPU's GDTR register points to base of GDT
- lgdt instruction loads GDTR
- program turns on protected mode by setting PE bit in CR0 register
How does the protection mechanisms of segmentation work?
- instructions can only r/w/x memory reachable through seg regs (between base, base+limit)
- Can any program change segment registers (e.g. mov 0 %ss)?
  - Yes. But the privilege level of the currently executing code must match that specified in the segment descriptor
  - Current privilege level (CPL) is in the low 2 bits of %cs (CPL=0 is privileged/kernel, CPL=3 is user)
- Can any program re-load GDTR? no! it's a priviledged instruction
- Can user program modify the GDT entries directly? It is in memory..

Case study of Lab3's use of segmentation

Check the answers for the prepared questions...

Why not segmentation?

Modern OS uses segmentation minimally
- Linux and MemOS have base 0 and max limit, resulting in virtual == linear address, thus making segmentation a no-op)
Using many segments in programs can be complex
- E.g. It's common for C code to create pointers to data on stack or to data on heap. If stack uses a different segment than heap, pointers must keep track of segment information in addition to addresses.
Segmentation suffers from external fragmentation, which occurs when the available memory is broken up into a lot of smaller fragments that cannot be used on their own to create a big segment.
Possible to resolve external fragmentation by moving allocated segments around (a procedure called compaction. However, compaction is expensive.

x86 Paging

break linear address space into fixed size chunks, which are referred to as pages. In x86, the size of a page is 4KB.
Independently control mapping for each page of linear address space
More degrees of freedom than segmentation (single base + single limit)
4K-byte pages implies 2^20 = 1M # of pages
Conceptual model: store in memory an array of 2^20 entries, called a page table, specifying the mapping for each linear page # to physical page #
Inform CPU of the physical address of the page table.
h/w performs address translation: table[20-bit linear page #] = 20-bit physical page #
In addition to translation, each table entry also records protection information (present, read/write, user/supervisor) (see bottom of handout)
Why not a single array for page table?
x86 uses 2-level mapping structure
one page directory page with 1024 page directory entries (PDEs)
up to 1024 page table pages, each with 1024 page table entries (PTEs)
A 32-bit virtual address consists of
- 10-bit page dir index (for identifying a PDE)
- 10-bit page table index (for identitying a PTE)
- 12-bit page offset
%cr3 holds physical address of the current page directory
On each memory access, h/w looks up in the table
- Page table lookup requires accessing memory, hence every memory access requires ≥ 2 physical memory accesses
- More memory accesses ---> slow! (memory latency takes hundreds of processing cycles)
- h/w optimization: CPU's TLB (translation lookaside buffer) caches virtual page # => physical page # mappings
- If any part of the page table is changed, TLB must be flushed!
  - by re-loading %cr3 (flushes everything)
  - by executing invlpg va
turn on paging by setting CR0_PG bit of %cr0

Here's how paging h/w translates linear address to physical address, in pseudo-code:

uint
hw_translate (uint la, bool user, bool write)
{
  uint pde; 
  pde = read_mem (%CR3 + 4*(la >> 22));
  hw_access (pde, user, write);
  pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
  hw_access (pte, user, write);
  return (pte & 0xfffff000) + (la & 0xfff);
}

// check protection. pxe is a pte or pde.
// user is true if CPL==3
void
hw_access (uint pxe, bool user, bool write)
{
  if (!(pxe & PG_P)  
      => page fault -- page not present
  if (!(pxe & PG_U) && user)
      => page fault -- not access for user

  if (write && !(pxe & PG_W)) {
      => page fault -- not writable
  }
}

How to use paging for memory protection?
- Can user programs modify %cr3?
- Can user programs modify page tables?

Case study: the memory mapping of Lab 3.


physical memory layout
+-----+-----------------------+----------------+------------------------------/
|     | Kernel         Kernel |       :    I/O |                           
|     | Code + Data     Stack |  ...  : Memory |                              
+-----+-----------------------+----------------+------------------------------/
0  0x40000                 0x80000 0xA0000     0x100000

Kernel is loaded into physical memory 0x40000-0x80000

First things first, to turn on paging, kernel must set up its own page tables.

Memos only uses one page directory page and one page table page. How much virtual memory is addressable?
Where to allocate these two pages physically?

How to set up kernel's linear --> physical memory mapping?

/* see memos-pages.c */
  void
paged_virtual_memory_init(void)
{
  int pgdir_pn, pgtbl_pn;
  uint32_t cr0;
  uintptr_t pa;

  // Create the kernel's page table.
  kernel_pgdir = pgdir_new();

  // Initialize mappings allowing the kernel to access kernel-only
  // physical memory.  User processes can also access the console.
  for (pa = 0; pa < PROC1_START_ADDR; pa += PAGESIZE)
    if (pa >= (uintptr_t) CONSOLE_BEGIN
        && pa < (uintptr_t) CONSOLE_END)
      pgdir_set(kernel_pgdir, pa, pa | PTE_P | PTE_W | PTE_U);
    else
      pgdir_set(kernel_pgdir, pa, pa | PTE_P | PTE_W);

  // Add mappings for user pages.
  for (pa = PROC1_START_ADDR; pa < PHYSICAL_MEMSIZE; pa += PAGESIZE)
    pgdir_set(kernel_pgdir, pa, pa | PTE_P | PTE_W | PTE_U);

  // Use special instructions to initialize paged virtual memory.
  lcr3((physaddr_t) kernel_pgdir);
  cr0 = rcr0();
  cr0 |= CR0_PE | CR0_PG | CR0_AM | CR0_WP | CR0_NE | CR0_TS
    | CR0_EM | CR0_MP;
  cr0 &= ~(CR0_TS | CR0_EM);
  lcr0(cr0);
}

What's the resulting kernel virtual address space?

the kernel's virtual address space
+-----+-----------------------+--------+----+-----+----------------------+
|     | Kernel         Kernel |                                          |
|     | Code + Data     Stack |                                          |
+-----+-----------------------+--------+----+-----+----------------------+
0  0x40000                 0x80000    0xB8000     0x100000              0x200000 
                                      CON_BEGIN   PROC1_START_ADDR      PHYSICAL_MEMSIZE

Before turning on paging, is kernel_pgdir a virtual or physical address? After turning on paging, what's kernel_pgdir? Why can the kernel program access the data of kernel page table by directly referring to kernel_pgdir?

By the very end of lab3, the virtual memory layout of app1 is as follows:

+-----+-----------------------+----------------+----------------/  ...  /-----+
|     | Kernel         Kernel |       :    I/O | App 1                  App 1 | 
|     | Code + Data     Stack |  ...  : Memory | Code + Data            Stack | 
+-----+-----------------------+----------------+---------------/   ... /------+
0  0x40000                 0x80000 0xA0000     0x100000                     0x300000

We can see kernel is mapped into the application's virtual address space. What's the advantage of mapping kernel into app's virtual space?
- Do we need to switch address space when switching to kernel mode execution (e.g. as a result of syscall)?
- Can app modify kernel memory?
- In Linux, kernel is mapped to virtual address 0xc0000000 and above (the high end of app's virtual address space). Lab3 maps kernel to low end for simplicity (so that all the kernel's data and page table entries have the same physical and virtual address).