- In the last lecture, we discussed:
- Race conditions
- H/W synchronization primitive (test-n-set, compare-n-swap)
- Spin-lock implementation
- This lecture will cover:
- Build high level primitives out of low level spin-locks and atomic operations (Mutex, Semaphore, conditional variable)
- The kernel's synchronization overview
Using spin-locks inside the kernel
- Let's look at the implementation of an example kernel service pipe buffer that allows a process to read what another process writes
- A pipe buffer is a bounded buffer (draw pic, P1, P2, bounded buffer inside kernel).
- When the buffer is full, the writer is blocked untill space is available
- When the buffer is empty, the reader is blocked until data is available
- The implementation:
#define N 8
typedef struct {
char buf[N]; /* N = 8 */
unsigned head;
unsigned tail;
spinlock_t lock;
} pipebuf_t;
void writec(pipebuf_t *p, char c) {
while (1) {
acquire(&p->lock);
if ((p->tail - p->head) < N) {
p->buf[p->tail % N] = c;
p->tail++;
release(&p->lock);
return; //simplified return from kernel to user space
}
release(&p->lock);
}
}
char readc(pipebuf_t *p, char c) {
while (1) {
acquire(&p->lock);
if (p->tail != p->head) {
p->buf[p->head % N] = c;
p->head++;
release(&p->lock);
return; //simplified return from kernel to user space
}
release(&p->lock);
}
}
- Race condition?
- Does overflow in head and tail still allow the code to work correctly?
- red lines are added to get rid of race condition
- Problems with this implementation?
- No bounded wait (i.e.possible starvation)
- With bounded wait, all processes waiting for a lock will wake up eventually. In other words, if a given lock is released an infinite number of times, than any acquire operation will eventually succeed.
- Inefficient spinning
- How to ensure bounded waiting and avoid spinning inefficiency?
- Use a wait queue!
- Wait queue contains a queue of processes waiting for a lock
- Processes in the queue are blocked.
- When the lock is released, the next process in queue order is activated and acquires the lock.
- This type of synchronization object is known as a mutex, short for mutual exclusion.
Implementing a Mutex
- Let's define the mutex data structure as a lock and a singly linked list of waiting processes:
typedef struct {
procdescriptor_t *p; /* process to wake */
struct wait *next; /* points to next element of wait queue */
} wait_t;
typedef struct {
unsigned locked;
wait_t *wfirst; /* for quick access to the head of wait queue */
wait_t *wlast; /* for quick access to the tail of wait queue */
spinlock_t l;
} mutex_t;
- Mutex lock operation: if lock unsuccessful, the process is added to the list
- Mutex unlock operation: When the lock is released, the next process on the list, if any, is woken up so it can acquire the lock.
- A candidate implementation
void
mutex_lock(mutex_t *m)
{
/* check if lock is free and no one is waiting if so, simply lock the lock and return */
acquire(&m->l);
if (m->locked == 0 && m->wfirst == NULL) {
m->locked = 1;
release(&m->l);
return;
}
/* lock is not free, add current process to the end of the wait queue */
wait_t w; //why can we put a stack-allocated variable on the linked list?
w.p = current;
w.next = NULL;
if (m->wlast)
m->wlast->link = &w;
else
m->wfirst = &w;
m->wlast = &w;
/* set the status of the current process as blocked */
current->state = BLOCKED;
release(&m->l);
/* schedule a different, runnable process */
schedule();
/* when the current process is waken up, it proceeds here */
acquire(&m->l);
/* the waken up process grab the lock and remove itself from the linked list*/
assert(m->locked == 0);
assert(m->wfirst == &w);
m->locked = 1;
if (w.link != NULL)
m->wfirst = w.link;
else
m->wfirst = m->wlast = NULL;
release(&m->l);
}
void mutex_unlock(mutex_t *m) {
acquire(&m->l);
if (m->wfirst)
m->wfirst->state = RUNNABLE;
m->locked = 0;
release(&m->l);
}
- Notice that the current process releases the spinlock while it is blocked, otherwise a
deadlock will result. In general, spinlocks cannot be held across calls to schedule.
- How did our mutex implementation solve the problems w/ spinlock pointed out earlier?
Historically, this functionality took a long time to design correctly. There are many locations for subtle race conditions to creep in. An example is the sleep-wakeup race.
The Sleep-Wakeup Race
- A very subtle race resulting from improper implementation of blocking
... in the mutex_lock function ....
release(&m->l);
current->state = BLOCKED;
schedule();
acquire(&m->l);
- The sleep-wakeup race example:
- Process A reaches the snippet above and releases the spinlock
- Process B, running on another core, which owns the mutex, tries to release the mutex. It succeeds
because the spinlock is free
- Process B, when releasing the mutex, grabs the first process on the wait queue, which is Process A, and sets its state to RUNNABLE
- Process A executes next, sets its state to BLOCKED and schedules, waiting to be waken up
- Since the mutex is now free, but Process A is still blocked on the wait queue, Process A will remain blocked indefinitely.
- The bug is subtle because the process's state is part of the critical section, even though the process's state is not part of the mutex_t structure.
- We can replace spinlocks in the bounded buffer implementation with mutexes. However,
- the main source of inefficiency in readc and writec still remains: the polling loop to determine if the pipe buffer is ready for either reading or writing.
Conditional variable
- Examine the
readc()
implementation. Ideally, we want the process to block until
- No other process is in the critical section AND
- The buffer is non-empty. i.e.
p->head != p->tail
- We will build another higher level primitive, conditional variable that realizes this requirement.
- Conditional variable: a variable or object that stands for a condition.
- The interface of a conditional variable
condvar_t cv; //declaration of a condition variable
void cond_wait(condvar_t *cv, mutex_t *m) {
//1. release mutex
//2. block until cond_notify(cv) is called
//THE FIRST TWO OPERATIONS OCCUR ATOMICALLY AS ONE
//3. acquire mutex
}
void cond_notify(condvar_t *cv) {
//wake up first process blocked in cond_wait
}
void cond_broadcast(condvar_t *cv) {
//wake up all the processes blocked in cond_wait
}
- Why is there a need to have a mutex in
cond_wait()
?
- To prevent sleep-wake up race
- The conditions for blocking in bounded buffer?
- nonempty (for readc())
- nonfull (for writec())
typedef struct {
char buf[N]; /* N = 8 */
unsigned head;
unsigned tail;
mutex_t l;
condvar_t nonempty; //indicates the buffer is not empty
condvar_t nonfull; //indicates the buffer is not full
} pipebuf_t;
void writec(pipebuf_t *p, char c) {
mutex_lock(&p->l); //acquire the lock outside the while loop
while (1) {
if (p->tail - p->head < N) {
p->buf[p->tail % N] = c;
p->tail++;
cond_notify(&p->nonempty);
mutex_unlock(&p->l);
return;
}
cond_wait(&p->nonfull, &p->lock); //block because the buffer is full
}
}
char readc(pipebuf_t *p) {
mutex_lock(&p->l);
while (1) {
if (p->tail != p->head) {
char c = p->buf[p->head % N];
p->head++;
cond_notify(&p->nonfull); //we know that the buffer is not full!
mutex_unlock(&p->l);
return c;
}
cond_wait(&p->nonempty, &p->l);
}
}
- Why is it correct? Two scenarios to think about:
- What if we put cond_notify after the mutex_unlock() statement?
- Assume that cond_wait did not release and re-acquire the mutex. If we place a mutex_lock and mutex_unlock
around the cond_wait() call, would it work?
- How to implement conditional variable?
- Similar to how mutex is implemented. Try to work out the details yourself.
Semaphores
- A semaphore is a synchornized integer with two atomic operations:
- P(s): probeer te verlagen (try to decrease) Used for acquiring resource
- This function blocks while the value of s is negative, then decrements s when it becomes unblocked.
- V(s): verhoog (increase) Used for releasing resource
- In Linux kernel lingo, P and V are called down and up, respectively.
- Semaphores are similar to locks, but are more versatile
- One can use semaphore to grant "resources" to more than 1 thread. e.g. use them to prevent a queue from overflowing its boundsone may use them to prevent a queue from overflowing its bounds.one may use them to prevent a queue from overflowing its bounds.one may use them to prevent a queue from overflowing its bounds.
- One can implement semaphores using spin-locks like what's done with mutexes
- One can also implement a mutex with a semaphore.
typedef struct {
semaphore_t s = 0;
} smutex_t;
mutex_lock(mutex_t *m) {
P(&m->s);
}
mutex_unlock(mutex *m) {
V(&m->s);
}
- We can initialize semaphore in variable states:
- 0: init semaphore in free state as a binary lock
- -1: init in locked state
- 1: in free state, allows 2 threads to obtain resources
- Use semaphore to impelement bounded buffers:
typedef struct {
unsigned head;
unsigned tail;
char buf[N];
sempahore_t lock = 0;
sempahore_t nonempty = -1; //initially locked
sempahore_t nonfull = N-1; //only want first N to succeed
} pipebuf_t;
void
writec(pipebuf_t *p, char c) {
P(&p->nonfull); //block until there is space in the buffer
P(&p->lock);
p->buf[p->tail %N] = c;
p->tail++;
V(&p->lock);
V(&p->nonempty); //increment because there is now an item in the buffer
}
char
readc(pipebuf_t *p) {
P(&p->nonempty); //block until there is something to read
P(&p->lock);
int c = p->buf[p->head % N];
p->head++;
V(&p->lock);
V(&p->nonfull); //increment becasue we opened space for another character in the buffer
return c;
}
- This implementation of writec and readc using semaphores rather than mutexes + conditional variables
is not better in terms of starvation or utilization, but it allows us to combine claiming space and blocking into one variable type.
- : recommended exercise. Implement your own read/write mutex
Lock granularity
- OS kernel is a complicated large software system. What should the locks protect?
- Two big questions to ponder: correctness and performance.
- The easiest way to get correct locking is to have just a
single lock, that every function grabs at the start and releases at the
end.
- in the old days of Linux, this is what's used, the "big kernel lock"
- A big lock limits only one CPU at a time can execute anywhere in your code.
If OS code is called a lot, this may reduce the performance of an expensive multiprocessor to that of a single CPU.
- Finer lock granularity increases concurrency and gets better performance.
- How to best reduce lock granularity is a bit of an art.
General strategies in Linux for synchronizing access to data-structure
- Below is an excerpt from "Understanding Linux Kernel" Table 5-8.
Kernel control paths accessing the data structure protection primitive
------------------------------------------------------------------------------------
exceptions only (e.g. syscall) semaphore
interrupts (e.g. disk or network interrupts) local interrupt disabling + spin lock
exceptions+interrupts local interrupt disabling + spin lock
...
- Why not semaphore for interrupts? or interrupts+exceptions case?