It is important to pick the right lock granularity. If the lock is too "big", there won't be enough concurrency. If the lock is too "small", it might not be correct (and might also bring deadlock problems). Explain the choice of lock granularity in Lab 1 for lock_server and argue why it is correct. Give an example of a fine grained lock that is incorrect for lock_server.
Q2: [RPC]
The RPC library for our labs provides at-most-once semantics and thus it does not guarantee delivery. Is it worthwhile to augment it to provide exactly-once semantics? Explain.
Q1: [MapReduce]
In MapReduce, each Mapper saves intermediate key/value pairs in R partitions on its local disk. Contrast the pros and cons of this approach to the alternative of having Mappers directly send intermediate results in R reducers that shuffle and save intermediate results on reducers' local disk before feeding them to the user-defined reduce function.
Q2: [MapReduce] Give a back-of-the-envelop calcuation for how long it takes to sort 1TB data (in 10^10 100-byte records) on one commodity computer. Give a back-of-the-envelop calculation for how long it takes to sort the same amount of data on 1000 machines using MapReduce (R=1000). Write down your assumptions about disk or network performances if there are any.
Q3: [Dryad] Describe a MapReduce program (or a series of MR programs) that solves the example SQL query in Section 2.1 of [Dryad]. If you assume different input file formats than ugriz.bin and and neighbors.bin than in the paper, explain them.
Q1: [Bayou]
Say you want to implement a Calendar application on top of Bayou that
supports three simple operations "ADD_EVENT", "DELETE_EVENT", "READ_EVENT".
Each of the operations takes a timeslot as its argument and adds/deletes/reads/
an event. For fault tolerance reasons, a user can access the Calendar
application from any Bayou server. Suppose each user is only allowed to access
his own private Calendar, what kind of "non-intuitive" or "anomalous" results your
Calendar user might experience? Suppose two users share a calendar, what
kind of "non-intuitive" results they might experience?
Q1: [SysR] What property must the system checkpoint operation guarantee? Atomicity? Consistency? How does System R guarantee it?
Q2: [Cedar] Why doesn't FSD log data writes?
Q1: [Two phase commit]
Beyond what point in the execution of the distributed 2-phase commit protocol is the transaction considered committed (no matter what happens later)?
Q2: Why must the coordinator in two phase commit log the outcome of its decision before sending out commit messages?
Q2: [Paxos]
Is Paxos guranteed to terminate with a consensus if there are a majority of
live nodes available? Why and why not?