Lecture notes courtesy of Eddie Kohler, Robert Morris and Frans Kaashoek
Overview
- Why synchronization?
Multiple cooperating execution contexts may adversely affect each other
when executing in a single system.
- Common ways to achieve cooperation among processes:
- share-nothing:
Different processses operate in a separate logical address space, pass data through files or messages.
- share-memory
Different processes directly share a single address space (threads).
- We look at the shared memory model today.
- Synchronization is the coordination of individual actions to ensure that data consistency is maintained.
- Synchronization is our weapon against concurrency.
The danger of concurrency: Race condition
- Several processes access and manipulate the same data concurrently. Outcome becomes
non-deterministic (depends on the particular order in which the access takes place).
- Our goal: ensure that only one process at a time can be manipulating this shared data (in the critical section).
- Where does concurrency come from?:
On a uni-processor:
- User-level: different threads (e.g. Lab1 Lab2's miniprocesses) access shared data. OS time-multiplexes
two threads.
- Kernel: interrupt stops current execution context and starts the interrupt handling context.
(Why does weensyos2's kernel have no synchronization?)
On a multi-core processor:
- Obvious...
- Example code for implementing bank deposits and withdrawls
int balance;
void
deposit()
{
balance++;
}
void
withdrawal()
{
balance--;
}
Let's start with initial balance = 0 and two threads (T1 and T2) making deposits of 1. What could happen?
Intuitive outcome: balance = 2
Danger of race condition (what could happen): balance = 1!!
What happened?
A closer look at banking example with ``objdump -d a.out''
08048344 deposit:
...
8048347: a1 6c 95 04 08 mov 0x804956c,%eax
804834c: 83 c0 01 add $0x1,%eax
804834f: a3 6c 95 04 08 mov %eax,0x804956c
...
08048356 withdrawl:
...
8048359: a1 6c 95 04 08 mov 0x804956c,%eax
804835e: 83 e8 01 sub $0x1,%eax
8048361: a3 6c 95 04 08 mov %eax,0x804956c
...
fix withdrawal overflow
bool
withdrawal()
{
if (balance > 0) {
balance--;
return true;
}else{
return false;
}
}
balance = 1, Two threads making withdrawl at the same time,
Intuitive outcome: T1=false,T2=true, balance=0 (or the opposite)
Possible outcome: T1=true,T2=true, balance=0
balance = 1, T1 making deposit, T2 making withdrawl
Intuitive outcome: balance =1
Possible outcome: balance=2?
Reasoning about correctness
Sequential consistency: 2 or more sequences of operations
are sequentially consistent iff the result of any execution
could be obtained by executing the operations one at a time
in some serial order. Furthermore, in the serial order, read ops must reflect the most recent write.
In the banking example, two threads T1 and T2 are executing the following sequence of operations:
T1 T2
--------
D D
W W
Being sequentially consistency implies that if initial balance=0, result will always be 0!
If sequentially consistent, not possible to get balance=1
What does it take to achieve sequential consistency?
Two requirements:
- ensure before-after atomicity for individual operations
- sequentially consistent operations at the h/w level (atomic instructions and sequentially consistent memory access)
Synchronization is the act of building sequential consistency for APPLICATION operations using the sequential consistency provided by lower level operations.
Why is synchronization difficult?
- Correctness is difficult to debug: race condition bugs are usually difficult to find and fix.
- Trade-off between an easy, correct implementation and utilizing the full performance.
Another banking example:
int
readbal()
{
return balance;
}
void
clearbal()
{
balance = 0;
}
Two threads execute the following ops concurrently:
T1 T2
-------
R C
R C
Initial balance=10, Can sequentially correct program produce R=10? R=5? R=0?
Is the above code sequentially correct?
- Yes: provided by h/w atomic ops and sequentially consistency memory access
Change the clear balance function to
void
clearbal() {
while (balance>0)
balance--;
}
Is it still correct?
The dissassembled function for clearbal():
...
804836b: eb 0d jmp 804837a
804836d: a1 8c 95 04 08 mov 0x804958c,%eax
8048372: 83 e8 01 sub $0x1,%eax
8048375: a3 8c 95 04 08 mov %eax,0x804958c
804837a: a1 8c 95 04 08 mov 0x804958c,%eax
804837f: 85 c0 test %eax,%eax
8048381: 7f ea jg 804836d
...
Hardware sequential consistency (x86)
An ATOMIC operation is an operation that cannot be split apart. The operation executes indivisibly, so no other thread can view an intermediate state.
register operations are atomic (because they are private to each individual thread or exe. context)
How about memory operations?
movl $0, %eax
addl $1, %edx
In x86, read operations within a single cache line and write operations within a single cache line are atomic.
cache line: The unit of caching, typically 64 bytes.
Modern machines almost always have two levels of cache, L1 and L2. L1 is on-chip, fast but small (10s KB L1). L2 smaller but much bigger 1-10MB.
Atomic h/w instructions should only use one cache line. Operations spanning multiple cache lines are not guaranteed to be atomic.
Better not to gamble on cache line alignment.
Hardware Synchronization primitive
h/w synchronizing instructions
atomically perform actions that would otherwise do not satisfy sequential consistency.
test-and-set
Conceptually, test and set implies
int
test_and_set(int *a, int v) {
int tmp = *a;
*a = v;
return tmp;
}
x86 instruction: xchgl (exchange)
Compare and swap (cmpxchg)
Conceptually, it implies
int
compare_and_swap(int *a, int old, int new)
{
if (*a == old) {
*a = new;
return 1;
}
else return 0;
}
Critical sections
A critical section is a set of instructions where sequential consistency requires they have
before-after atomicity. i.e. at most one thread's instruction pointer is in the set at any moment.
CS goals
- Safety (aka mutual exclusion): no more than one thread in CS at any time
- Liveness (aka progress): a thread trying to access CS will eventually succeed
- Bounded waiting: if a thread has requested to access CS, it will succeed after at most N other threads finishes CS.
- possibly fairness: if two threads are both trying to access CS, they have equal chances of success
Prefer the smallest critical section possible. This gives the best performance and utilization.
bool
withdrawal()
{
if (balance > 0) { //--- CS
balance--; //----CS
return true;
}else{
return false;
}
}
void
deposit()
{
balance++; //--- CS
}
Any smaller CS possible?
Enforcing CS using locks
The lock abstraction
A lock is an object with two operations: acquire(L) and release(L).
The lock has state: it is either locked or not locked.
A call to acquire(L) waits until L is not locked, then changes its state to locked and returns. A call to release(L) marks L as not locked.
bracket every one of your CS with acquire(L) and release(L)
Lock l;
bool
withdrawal()
{
acquire(&L);
if (balance > 0) { //--- CS
balance--; //----CS
release(&L);
return true;
}else{
release(&L);
ret = false;
}
}
void
deposit()
{
acquire(&L);
balance++; //--- CS
release(&L);
}
How to implement acquire and release?
struct Lock {
int locked;
}
void acquire(Lock *l)
{
while (1) {
if (l->locked == 0) { //C
l->locked = 1; //D
break;
}
}
}
void
release(Lock *l)
{
l->locked = 0;
}
Does it work?
Assume threads execute on different CPUs
Problem: Two acquire()s on the same lock on different CPUs might both execute line C, and then both execute D. Then both will think they have acquired the lock.
The correct implementation uses h/w primitive
void acquire(Lock *l){
while(1){
if(xchg(&l->locked, 1) == 0)
break;
}
}
void release(Lock *l){
l->locked = 0;
}
Why does this work? If two CPUs execute xchg at the same time, the hardware ensures that one xchg does both its steps before the other one starts. So we can say "the first xchg" and "the second xchg". The first xchg will read a 0 and set lock->locked to 1 and the acquire() will return. The second xchg will then see lock->locked as 1, and the acquire will loop.
This style of lock is called a spin-lock, since acquire() stays in a loop while waiting for the lock.
Shouldn't need to use spin-lock on a single-core
On a single core machine, you can cheat!
acquire()
{
cli(); //disable interrupt
}
release()
{
sti(); //enable interrupt
}
Linux kernel heavily uses this trick in the single core days (does not work on multiprocessors)!!
Spin-locks are good when protecting short operations: increment a counter,
search in i-node cache, allocate a free buffer. In these cases acquire()
won't waste much CPU time spinning even if another CPU holds the lock.
Better alternatives for threads that need to wait for a long CS