Overview

In this sequence of labs, you'll build a multi-server file system called Yet-Another File System (yfs) in the spirit of Frangipani. At the end of all the labs, your file server architecture will look like this:

You'll write a file server process, labeled yfs above, using the FUSE toolkit. Each client host will run a copy of yfs. yfs will appear to local applications on the same machine by registering via FUSE to receive file system events from the operating system. The yfs extent server will store all the file system data on an extent server on the network, instead of on a local disk. yfs servers on multiple client hosts can share the file system by sharing a single extent server.

This architecture is appealing because (in principle) it shouldn't slow down very much as you add client hosts. Most of the complexity is in the per-client yfs program, so new clients make use of their own CPUs rather than competing with existing clients for the server's CPU. The extent server is shared, but hopefully it's simple and fast enough to handle a large number of clients. In contrast, a conventional NFS server is pretty complex (it has a complete file system implementation) so it's more likely to be a bottleneck when shared by many NFS clients.

Lab assignments

Lab 1 - Lock Server
Lab 2 - Basic File Server
Lab 3 - File Server: Reading, Writing and Sharing Files
Lab 4 - MKDIR, REMOVE, and Locking
Lab 5 - Paxos
Lab 6 - Replicated lock server
Final Project

Collaboration Policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution (and you're not allowed to look at code from previous years). You may discuss the assignments with other students, but you may not look at or copy each others' code.

Acknowledgements

This lab series have been adapted from those originally developed for the MIT class 6.824 by Robert Morris, Frans Kaashoek and the 6.824 staff.

Programming Environment

You need to do all labs on a Linux machine with pthread and FUSE support. In principle, any UNIX-style machine such as FreeBSD or MacOS would work, however, there are minor annoying differences between FUSE on Linux and FUSE on other operating systems that may cause your code to fail our tests when it seems to pass for you. Thus, if you choose to do your labs a non-Linux machine, please ensure that your assignment passes the tests on the class machines we have provided.

To install FUSE on your own Ubuntu/Debian linux box, do apt-get install libfuse2 libfuse-dev fuse-utils. Or, you can compile and install fuse-2.6-3 from source.

Lab Programming Tips

pthreads

This series of labs is built around the POSIX threads (pthreads) programming model. The pthreads package allows you to run different tasks concurrently within a process, lock access to shared data during critical sections, and communicate between threads using shared data. A comprehensive guide to programming with pthreads can be found here: http://www.llnl.gov/computing/tutorials/pthreads/.

Pthread interfaces are somewhat cumbersome to use. You can (optionally) use our custom wrapper objects in pwrapper.h. For example, pMutex and pCond take care of pthread initializations for mutexes and conditional variables automatically for you. Their usage is very straightforward and an example use of pMutex can be found in lock_tester.cc of Lab 1. The class pScopedMutex has an interesting use. In particular, in its constructor function, it locks the pMutex argument. Its destructor function unlocks the corresponding pMutex. Therefore, an easy way to perform subsystem/protocol locking in a function is to declare a pScopedMutex with the subsystem pMutex in the function. When the function returns, the compiler ensures that the pScopedMutex's destructor will be called, thus automatically unlocking the subsystem mutex. (It frees you from the burden of having to explicitly unlock the mutex in multiple places when there are multiple return points in a function.)

In this and later labs, we try to adhere to a simple (coarse-grained) locking convention: we acquire the subsystem/protocol lock at the beginning of a function and release it before returning. This convention works because we don't require atomicity across functions, and we don't share data structures between different subsystems/protocols. You will have an easier life by sticking to this convention.

RPC

The labs use our own customized RPC system (instead of the standardized SUN RPC system). Our RPC system launches a new thread on the server for each RPC received; the thread is destroyed upon return from the RPC call. Our RPC system is quite simple. In particular, you need to provide your own RPC procedure numbers and marshalling routines for user-defined arguments and return types. You will find examples for doing this in the skeleton code we have provided you. More importantly, there is no type checking on RPC procedures across the rpc client and server. If the arguments specified in your RPC invocation at the client do not match the actual procedure's implementation at the server, you will see an assertion failure somewhere in the rpc.{cc,h} files during runtime.

C++ Standard Template Library (STL)

We recommend you use the Standard Template Library, a collection of C++ classes implementing many common data structures and algorithms. You will find the classes std::string, std::map and std::vector particularly useful during these labs.

Debugging

printf statements are always your friend when debugging any kind of problem in your programs. However, when programming in C/C++, you should always be familiar with gdb, the GNU debugger. You may find this gdb reference useful. Below introduces a few gdb tips for complete newbies:

If your program is crashing (segmentation fault), type gdb program core where program is the name of the binary executable to examine the core file. If you don't find the core file anywhere, type ulimited -c unlimted befor starting your program again. Once inside gdb, type bt to examine the stack trace when the segmentation fault happened.

While your programming is running, you can attach gdb to it by typing gdb program 1234. Again, program is the name of the binary executable. 1234 is the process number of your running program. Of course, you can choose to run your program with gdb from the beginning. If so, simply type gdb program. Then at the gdb prompt, type run.

While in gdb, you can set breakpoints (use gdb command b) to stop the execution at specific points, examine variable contents (use gdb command p), etc.

To apply a given gdb command to all threads in your program, prepend thread apply all to your command. For example, thread apply all bt dumps the backtrace for all threads.