In this lab you'll implement Raft, a replicated state machine protocol. If you are choosing the extra lab option (instead of the project option), in Lab-4, you'll build a key/value service on top of your Raft implementation and in Lab-5, you will “shard” your service over multiple replicated state machines for higher performance.
A replicated service (e.g., key/value database) achieves fault tolerance by storing copies of its data on multiple replica servers. Replication allows the service to continue operating even if some of its servers experience failures (crashes or a broken or flaky network). The challenge is that failures may cause the replicas to hold differing copies of the data.
Raft manages a service's state replicas, and in particular it helps the service sort out what the correct state is after failures. Raft implements a replicated state machine. Like Viewstamp replication, Raft organizes client requests into a sequence, called the log, and ensures that all the replicas agree on the contents of the log. Each replica executes the client requests in the log in the order they appear in the log, applying those requests to the replica's local copy of the service's state. Since all the live replicas see the same log contents, they all execute the same requests in the same order, and thus continue to have identical service state. If a server fails but later recovers, Raft takes care of bringing its log up to date. Raft will continue to operate as long as at least a majority of the servers are alive and can talk to each other. If there is no such majority, Raft will make no progress, but will pick up where it left off as soon as a majority can communicate again.
In this lab you'll implement Raft as a Go object type with associated methods, meant to be used as a module in a larger service. A set of Raft instances talk to each other with RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. The entries are numbered with index numbers. The log entry with a given index will eventually be committed. Unlike Lab-1 which does not do anything with committed entries, your Raft should send the log entry to the larger service for it to execute.
Only RPC may be used for interaction between different Raft instances. For example, different instances of your Raft implementation are not allowed to share Go variables. Your implementation should not use files at all.
In this lab you'll implement most of the Raft design described in the extended paper, including saving persistent state and reading it after a node fails and then restarts. You will not implement cluster membership changes (Section 6) or log compaction / snapshotting (Section 7).
You should consult the extended Raft paper and the Raft lecture notes. You may find it useful to look at this advice written for MIT's 6.824 students in 2016, and this illustrated guide to Raft.
This lab is due in three parts. You must submit each part on the corresponding due date. This lab does not involve a lot of code, but concurrency makes it potentially challenging to debug; start each part early.
Please do not publish your code or make it available to current or future students. The github repositories we made for you are private and DO NOT clone your solution into any other public respository.
Do a git pull to get the latest lab software. We supply you with skeleton code and tests in src/raft, and a simple RPC-like system in src/labrpc.
To get up and running, execute the following commands:
$ cd ~/golabs $ git pull ... $ cd src/raft $ export GOPATH=$HOME/golabs $ go test Test (2A): initial election ... --- FAIL: TestInitialElection (5.03s) config.go:270: expected one leader, got 0 Test (2A): election after network failure ... --- FAIL: TestReElection (5.03s) config.go:270: expected one leader, got 0 ... $When you've finished all three parts of the lab, your implementation should pass all the tests in the src/raft directory:
$ go test --- PASS: TestReElection2A (14.34s) --- PASS: TestBasicAgree2B (9.00s) --- PASS: TestFailAgree2B (11.34s) --- PASS: TestFailNoAgree2B (11.78s) --- PASS: TestConcurrentStarts2B (2.22s) --- PASS: TestRejoin2B (14.72s) --- PASS: TestBackup2B (35.22s) --- PASS: TestCount2B (7.57s) --- PASS: TestPersist12C (41.26s) --- PASS: TestPersist22C (140.74s) --- PASS: TestPersist32C (17.64s) --- PASS: TestFigure82C (137.05s) --- PASS: TestUnreliableAgree2C (12.68s) --- PASS: TestFigure8Unreliable2C (43.14s) --- PASS: TestReliableChurn2C (26.89s) --- PASS: TestUnreliableChurn2C (43.84s) PASS
Your implementation must support the following interface, which the tester and (eventually) your key/value server will use. You'll find more details in comments in raft.go.
// create a new Raft server instance: rf := Make(peers, me, persister, applyCh) // start agreement on a new log entry: rf.Start(command interface{}) (index, term, isleader) // ask a Raft for its current term, and whether it thinks it is leader rf.GetState() (term, isLeader) // each time a new entry is committed to the log, each Raft peer // should send an ApplyMsg to the service (or tester). type ApplyMsg
A service calls Make(peers,me,…) to create a Raft peer. The peers argument is an array of established RPC connections, one to each Raft peer (including this one). The me argument is the index of this peer in the peers array. Start(command) asks Raft to start the processing to append the command to the replicated log. Start() should return immediately, without waiting for this process to complete. The service expects your implementation to send an ApplyMsg for each new committed log entry to the applyCh argument to Make().
Implement leader election and heartbeats (AppendEntries RPCs with no log entries). The goal for Part 2A is for a single leader to be elected, for the leader to remain the leader if there are no failures, and for a new leader to take over if the old leader fails or if packets to/from the old leader are lost. Run go test -run 2A to test your 2A code.
Be sure you pass the 2A tests before submitting Part 2A. Note that the 2A tests test the basic operation of leader election. Parts B and C will test leader election in more challenging settings and may expose bugs in your leader election code which the 2A tests miss.
$ git commit -am "Submit lab-2A" $ git push origin
Implement the leader and follower code to append new log entries. This will involve implementing Start(), completing the AppendEntries RPC structs, sending them, fleshing out the AppendEntry RPC handler, and advancing the commitIndex at the leader. Your first goal should be to pass the TestBasicAgree() test (in test_test.go). Once you have that working, you should get all the 2B tests to pass (go test -run 2B).
Be sure you pass the 2A and 2B tests before submitting Part 2B.
Before submitting, please run the 2A and 2B tests one final time. Some bugs may not appear on every run, so run the tests multiple times.
To handin your files, simply commit and push them to github.com
$ git commit -am "Submit lab-2B" $ git push origin
If a Raft-based server reboots it should resume service where it left off. This requires that Raft keep persistent state that survives a reboot. The paper's Figure 2 mentions which state should be persistent, and raft.go contains examples of how to save and restore persistent state.
A “real” implementation would do this by writing Raft's persistent state to disk each time it changes, and reading the latest saved state from disk when restarting after a reboot. Your implementation won't use the disk; instead, it will save and restore persistent state from a Persister object (see persister.go). Whoever calls Raft.Make() supplies a Persister that initially holds Raft's most recently persisted state (if any). Raft should initialize its state from that Persister, and should use it to save its persistent state each time the state changes. Use the Persister's ReadRaftState() and SaveRaftState() methods.
Implement persistence by first adding code that saves and restores persistent state to persist() and readPersist() in raft.go. You will need to encode (or "serialize") the state as an array of bytes in order to pass it to the Persister. Use Go's gob encoder to do this; see the comments in persist() and readPersist().
You now need to determine at what points in the Raft protocol your servers are required to persist their state, and insert calls to persist() in those places. You must also load persisted state in Raft.Make(). Once you've done this, you should pass the remaining tests. You may want to first try to pass the "basic persistence" test (go test -run 'TestPersist12C'), and then tackle the remaining ones (go test -run 2C).
In order to avoid running out of memory, Raft must periodically discard old log entries, but you do not have to worry about this until the next lab.
$ git commit -am "Submit lab-2C" $ git push originPlease post questions on Piazza.