Introduction
Form a team of 1-2 students to complete the final project (Teams of 2 are recommended).
The project should be related to the class material and consists of original work
done for the class. You are welcome to make up your own project and we also
give example project ideas. If you decide to work on your own idea, please
email the staff so we can give you guidance on the scope and provide more
background material.
Upon completion of the project, your team is required to:
- submit a 1-page writeup on your project.
- present a demo in class (and optionally attend the NYU CS Fall project showcase)
The amount of efforts your team required for the project is about one 2-week lab (such as lab5 or lab 7), adjusted to
reflect the number of students working on the project. The project should be in
the tradition of labs: find real problem with an existing system (such as yfs),
code up your solution and evaluate how well you have solved the problem.
Example project ideas
Below are some example extensions to YFS that would make suitable projects.
Feel free to use any one of these, if you have no other ideas.
- Total fault-tolerance
Extend YFS to be tolerant of arbitary client failures and be able to recover from
total (lock and extent) server failures.
This will involve designs for lock expiration using leases, persistent logging of client
operations at lock servers, storing persistent state at the extent server and
enabling correct crash recovery of both types of servers.
- Bandwidth- and disk-space efficient extent service
The current extent service implementation stores whole files in
memory, and returns the entire file on each get. The goal in this project would
be to reduce the network traffic between the extent server and its clients, as
well as reducing the memory and disk usage of the extent server. First, you should
come up with an efficient scheme for laying out extents on disk (for example,
use a single file on the native file system to store all of the extents,
and do your best to reduce fragmentation of the space within that file);
take both large and small files into consideration. You may
want to consider storing files in blocks, rather than one huge extent.
(You may also want to use compression to limit the size of the data on disk.) Second, you should
enable caching of extents at yfs_clients so a client does not need to fetch
unmodified extents again from the extent server. Furthermore, you can explore known
techniques for how to reduce bandwidth between the server and clients by
sending only diffs of data, rather than the full extent (see the LBFS paper,
for example); you should implement some similar scheme for YFS.
Demonstrate the performance and space benefits of your solution.
- Mini-transactions
Implement Sinfonia's mini-transactions for the extent
service and modify yfs_client to use them instead of the lock service.
(As a starter, you do not need to recover from crashes.)
- Access control
Implement a notion of access control in yfs. You need to extent your FUSE
interface to know about users (and groups!), and also make the appropriate
extensions to your yfs data structures. You'll also need to add support for any
related file system calls (i.e., chmod, chown, chgrp, etc). You may choose the
easy design of simply implementing access control in the yfs_client, i.e.
yfs_client grants permission based on the requesting user and file permissions.
Note that this design is not secure in the sense that you are trusting the clients;
in general, clients can not be trusted (client machines are administered by untrustworthy
individuals who may run arbitary code instead of the correct yfs_client.)
- Disconnected YFS
Allow users to perform file system operations while his yfs_client is disconnected
from the network. You need to cache extent (or even prefetch) on disk at yfs_clients in order to
allow a disconnected yfs_client to read and modify cached extents. You also
need to come up with some plan for merging divergent extents once the client
comes back online. Check out the Coda paper for a
starting point. A cool demo is to mount yfs on your laptop, disconnect the
laptop and continue working on the files. Later, connect the laptop back online and
show that other clients can see your changes.
- Build a versioning file system
Instead of over-writing the existing file system, build a versioning file system that
keeps a history of past file system images. You can draw inspiration from
some past work on versioning file system here
- Build a scalable extent service
At the moment, your YFS implementation just uses a single extent server to store
its data. It would be much more useful, for scaling the actual storage and spread read/write loads, if YFS
could spread its data across many extent servers. You may choose the
consistent hashing design
as in Dynamo, or come
up with a scheme of your own. Make sure your extent service is consistent
across failures and concurrent accesses. (As a starter, you may only replicate
an extent in the memory of multiple extent servers and not bother storing them
persistently.) Show some benefit of this system under a real workload,
either in terms of performance or fault tolerance.