Distributed Systems Schedule

Overview

Please print out your answers and hand them in during the class. Thanks!

Lecture 2

Q1: [C#threads]

It is important to pick the right lock granularity. If the lock is too "big", there won't be enough concurrency. If the lock is too "small", it might not be correct (and might also bring deadlock problems). Explain the choice of lock granularity in Lab 1 for lock_server and argue why it is correct. Give an example of a fine grained lock that is incorrect for lock_server.

Q2: [RPC]

The RPC library for our labs provides at-most-once semantics and thus it does not guarantee delivery. Is it worthwhile to augment it to provide exactly-once semantics? Explain.

Lecture 3

Q1: [MapReduce]

In MapReduce, each Mapper saves intermediate key/value pairs in R partitions on its local disk. Contrast the pros and cons of this approach to the alternative of having Mappers directly send intermediate results in R reducers that shuffle and save intermediate results on reducers' local disk before feeding them to the user-defined reduce function.

Q2: [MapReduce] Give a back-of-the-envelop calcuation for how long it takes to sort 1TB data (in 10^10 100-byte records) on one commodity computer. Give a back-of-the-envelop calculation for how long it takes to sort the same amount of data on 1000 machines using MapReduce (R=1000). Write down your assumptions about disk or network performances if there are any.

Q3: [Dryad] Describe a MapReduce program (or a series of MR programs) that solves the example SQL query in Section 2.1 of [Dryad]. If you assume different input file formats than ugriz.bin and and neighbors.bin than in the paper, explain them.

Lecture 5

Q1: [Bayou]
Say you want to implement a Calendar application on top of Bayou that supports three simple operations "ADD_EVENT", "DELETE_EVENT", "READ_EVENT". Each of the operations takes a timeslot as its argument and adds/deletes/reads/ an event. For fault tolerance reasons, a user can access the Calendar application from any Bayou server. Suppose each user is only allowed to access his own private Calendar, what kind of "non-intuitive" or "anomalous" results your Calendar user might experience? Suppose two users share a calendar, what kind of "non-intuitive" results they might experience?

Lecture 6

Q1: [SysR] What property must the system checkpoint operation guarantee? Atomicity? Consistency? How does System R guarantee it?

Q2: [Cedar] Why doesn't FSD log data writes?

Lecture 7

Q1: [Two phase commit]
Beyond what point in the execution of the distributed 2-phase commit protocol is the transaction considered committed (no matter what happens later)?

Q2: Why must the coordinator in two phase commit log the outcome of its decision before sending out commit messages?

Lecture 9

Q1: [Hypervisor]
Suppose the primary fails due to a power failure, the backup takes over, and then power to the primary is restored. You would like the primary to resume replicated execution so that the system could tolerate a subsequent failure of the backup. Is this likely to be easy with hypervisor-based fault-tolerance? Why or why not?

Q2: [Paxos]
Is Paxos guranteed to terminate with a consensus if there are a majority of live nodes available? Why and why not?

Lecture 10

Q1: [PBFT]
If bad nodes are simply faulty (i.e. they happen to compute incorrect results) as opposed to being actively malicious, what is the number of replicas needed to handle f bad nodes? Why?

Lecture 11

Q1: [SUNDR]
In the simple straw-man, both fetch and modify operations are placed in the log and signed. Suppose an alternate design that only signs and logs modify operations. Does this allow a malicious server to break fetch-modify consistency or fork consistency? Why or why not? (Courtesy of Frans Kaashoek).

Lecture 12

Q1: [LBFS]
LBFS decides breakpoints based on the fingerprint of a 48-byte region. Why not just determine the breakpoints based on the actual content of the 48-byte region?