Introduction
Form a team of 1-2 students to complete the final project.
The project should be related to the class material and consists of original work
done for the class (You cannot recyle past projects from other classes). You
are welcome to make up your own project and we
give example project ideas for those who have no other ideas.
You should email the staff about your project idea so we can give you guidance
on the scope and provide more background material.
Upon completion of the project, your team is required to:
- submit a short writeup on your project (no more than 5 pages)
- present a demo to the staff (and optionally attend the NYU CS Fall project showcase)
The amount of efforts your team should put in the project is about 3 labs, adjusted to
reflect the number of students working on the project. The project should be in
the tradition of labs: find real problem with an existing system (such as yfs),
code up your solution and evaluate how well you have solved the problem.
Example project ideas
Below are some example extensions to YFS that would make suitable projects.
Feel free to use any one of these, if you have no other ideas.
- Total fault-tolerance
Extend YFS to be tolerant of arbitary client failures and be able to recover from
total (lock and extent) server failures.
This will involve designs for lock expiration using leases, persistent logging of client
operations at lock servers, storing persistent state at the extent server and
enabling correct crash recovery of both types of servers.
- Bandwidth- and disk-space efficient extent service
The current extent service implementation stores whole files in
memory, and returns the entire file on each get. The goal in this project would
be to reduce the network traffic between the extent server and its clients, as
well as reducing the memory and disk usage of the extent server. First, you should
come up with an efficient scheme for laying out extents on disk (for example,
use a single file on the native file system to store all of the extents,
and do your best to reduce fragmentation of the space within that file);
take both large and small files into consideration. You may
want to consider storing files in blocks, rather than one huge extent.
(You may also want to use compression to limit the size of the data on disk.) Second, you should
enable caching of extents at yfs_clients so a client does not need to fetch
unmodified extents again from the extent server. Furthermore, you can explore known
techniques for how to reduce bandwidth between the server and clients by
sending only diffs of data, rather than the full extent (see the LBFS paper,
for example); you should implement some similar scheme for YFS.
Demonstrate the performance and space benefits of your solution.
- A coarse-grained lock service
Implement Google's Chubby
service.. You can re-use much of the RSM and Paxos code for your
implementation of Chubby. Unlike YFS' lock service, Chubby is coarse-grained,
meaning that it expects to handle far fewer operations. Coarse-grained lock
service is usually used to elect a primary, i.e. only potential primaries
attempt to acquire locks from Chubby and will hold acquired locks for a long
time. For example, you can use Chubby to elect a primary lock server (and a
corresponding set of backup servers) to handle fine-grained locking without
involving Chubby nodes unless when failures happen.
- Disconnected YFS
Allow users to perform file system operations while his yfs_client is disconnected
from the network. You need to cache extent (or even prefetch) on disk at yfs_clients in order to
allow a disconnected yfs_client to read and modify cached extents. You also
need to come up with some plan for merging divergent extents once the client
comes back online. Check out the Coda paper for a
starting point. A cool demo is to mount yfs on your laptop, disconnect the
laptop and continue working on the files. Later, connect the laptop back online and
show that other clients can see your changes.
- Build a versioning file system
Instead of over-writing the existing file system, build a versioning file system that
keeps a history of past file system images. You can draw inspiration from
some past work on versioning file system here
- Build a shared p2p memory cache
If all nodes keep cache objects (e.g. file system objects, or even database query results), it's
beneficial to make them share cache with each other to take read load of the server.
It could be something like memcached, but with
gurantee that the shared cache is consistent under concurrency and failures.
- Scalable extent service
At the moment, your YFS implementation just uses a single extent server to store
its data. It would be much more useful, for scaling the actual storage and spread read/write loads, if YFS
could spread its data across many extent servers. You can implement the Petal
design, consistent hashing as in Chord,
or feel free to come up with a scheme of your own. Show some benefit of this
system under a real workload, either in terms of performance or fault
tolerance.