In this lab you'll add persistence to your key/value server. The overall goal is to be able to recover after the crash and restart of one or more key/value servers. It's this capability that makes fault-tolerance really valuable! The specific properties you'll need to ensure are:
You do not need to design a high-performance format for the on-disk data. It is sufficient for a server to store each key/value pair in a separate file, and to use a few more files to store its other state.
You do not need to add persistence to the shardmaster. The tester uses your existing shardmaster package.
This lab requires more thought than you might think.
You may find Paxos Made Live useful, particularly Section 5.1.
Do a git pull to get the latest lab software. We supply you with new skeleton code and new tests in src/diskv.
$ cd ~/golabs $ git pull ... $ cd src/diskv $
First merge a copy of your Lab 4 code (i.e. copy your changes of lab4 code) into diskv/server.go, common.go, and client.go. Be careful when merging StartServer(), since it's a bit different from Lab 4. And don't copy test_test.go; Lab 5 has a new set of tests.
There are a few differences between the Lab 4 and Lab 5 frameworks. First, StartServer() takes an extra dir argument that tells a key/value server the directory in which it should store its state (key/value pairs, Paxos state, etc.). A server should only use files under that directory; it should not use any other files. The tests give a server the same directory name each time the tests re-start a given server. StartServer() can tell if it has been re-started (as opposed to started for the first time) by looking at its restart argument. The tests give each server a different directory.
The second big framework difference is that the Lab 5 tests run each key/value server as a separate UNIX process, rather than as a set of threads in a single process. main/diskvd.go is the main() routine for the key/value server process. The tester runs diskvd.go as a separate program, and diskvd.go calls StartServer().
After merging your Lab 4 code into diskv, you should be able to pass the tests that print (lab4). These are copies of Lab 4 tests.
If a server crashes, loses its disk, and re-starts, a potential problem is that it could participate in Paxos instances that it had participated in before crashing. Since the server has lost its Paxos state, it won't participate correctly in these instances. So you must find a way to ensure that servers that re-join after disk loss only participate in new instances.
diskv/server.go includes some functions that may be helpful to you when reading and writing files containing key/value data.
You may want to use Go's gob package to format and parse saved state. Here's an example. As with RPC, if you want to encode structs with gob, you must capitalize the field names.
The Lab 5 tester will kill key/value servers so that they stop executing at a random place, which was not the case in previous labs. One consequence is that, if your server is writing a file, the tester might kill it midway through writing the file (much as a real crash might occur while writing a file). A good way to cause replacement of a whole file to nevertheless be atomic is to write to a temporary file in the same directory, and then call os.Rename(tempname,realname).
You'll probably have to modify your Paxos implementation, at least so that it's possible to save and restore a key/value server's Paxos state.
Don't run multiple instances of the Lab 5 tests at the same time on the same machine. They will remove each others' files.
$ cd ~/golabs $ make lab5Submit lab5-handin.tar.gz at this link
You will receive full credit if your software passes the test_test.go tests when we run your software on our machines. We will use the timestamp of your last submission for the purpose of calculating late days.