Lab 6

Introduction

In this lab you will replicate your lock server using the replicated state machine (RSM) approach (see Schneider's RSM paper for a good, but non-required, reference. We will also discuss an example replicated state machine in Lecture.) In the replicated state machine approach, one machine is the master and the others are slaves. The master is in charge of receiving requests from clients and executing them on all replicas. To ensure that all replicas have identical state, the replicas must execute all requests in the same order and all requests must produce the same result on all replicas (i.e., the handlers must be deterministic). The RSM uses the Paxos protocol implemented in the previous lab to agree on the current master and node membership to cope with failed and re-joined replicas.

To ensure all requests are executed in a unique total order, the master assigns each request a viewstamp number which dictates the total order. The viewstamp consists of two fields, the view number (obtained from Paxos) and a monotonically increasing sequence number. The viewstamps assigned to all RSM requests dictate a total order among them. In particular, viewstamps with a lower view number are ordered before those with a higher view number and within the same view number, viewstamps with lower seqnos are ordered before those with higher seqnos. How do we guarantee all viewstamps form a unique total order? This is because Paxos guarantees all view numbers form a total order. Additionally, within each view, all nodes agree on the current view's membership and thus each RSM node can use the agreed upon membership to agree on a unique master who is the only one that can assign each request an increasingly seqno to properly order requests within a view.

The primary task in the lab is building a RSM library on top of our existing RPC library so that you can plug in any RPC program you have written so far and replicate it. To ensure the appropriate behavior, however, there are certain constraints on the RPC handlers. Most importantly, the RPC handlers must run to completion without blocking, and be deterministic. These constraints will ensure that all replicas execute all requests in the same order and with the same result. Once you have built the RSM library we will ask you to replicate the lock server you built in previous labs using RSM. In this lab it should become clear why we asked you to write the lock server in a way so that there are no blocking RPC handlers.

Getting Started

Begin by initializing your Lab 6 directory with your implementation from Lab 5 and then updating it with some new infrastructure code in http://www.news.cs.nyu.edu/~jinyang/fa08/labs/yfs-lab6.tgz. Since you are building on the past labs, make sure your code in this new directory passes all tests for previous labs before starting in on this lab.

The new code will overwrite the Makefile file from your previous labs, so you may have to redo any changes in that file.

% wget -nc http://www.news.cs.nyu.edu/~jinyang/fa08/labs/yfs-lab6.tgz
% rsync -av l5/* l6
% tar xzvf yfs-lab6.tgz

We provide you with some skeleton code of the RSM library. The library has the client and server class for RSM in filesrsm_client.{cc,h} and rsm.{cc,h}.

The RSM client class (rsm_client) is used by a client program to request service from the master replica in the RSM. The RSM client takes in its constructor the address of a known replica, and immediately contacts the node to learn the addresses of all replicas as well as the current master. The client program (e.g. the lock_client class) can invoke a call method on the RSM client object (just as if it were an RPC client). The call method on RSM client will marshall RSM request and send it via the rsm_client_protocol::invoke RPC to the master replica. (The RPC protocol between the RSM client and RSM server (replica) is defined in the rsm_client_protocol class in file rsm_protocol.h).

To turn any server program into a replica in the RSM service, your application (e.g. lock_server class) creates an RSM server object (rsm) and use it in place of the normal RPC server rpcs object. The RSM server constructor creates a paxos object with arguments consisting the id of the first replica ever created and the id of this server. The RSM server registers a number of RPC handlers and spawns off a recovery thread to synchronize state with the master replica when Paxos has agreed on a stable view.

Once the master is in a stable state, it can process invoke RPCs from RSM clients. For each request, the master assigns it the next viewstamp number with an increasing seqno. The master then issues an invoke RPC on all replicas in the current view. The replicas unmarshall the request, and execute the registered handler. Note that the replicas must execute requests in the same total order as dictated by the requests' viewstamps without any gaps in seqno. If the master has succeeded in executing a request on all replicas (including itself), it will reply to the client. If the master has encountered replica failures during this process, it should instruct its paxos object to inititate view changes. Occasionally, a RSM client might send its request to a non-master node in which case the node should reject the client's request.

When a failed replica re-joins a running RSM, it has potentially missed many requests and must do a state transfer to bring its state in sync with the other replicas before it can process any requests. Additionally, when the master has encountered a failure during the process of invoking the client request at various replicas, some replicas might have executed the request while others not. Thus, the RSM servers must be able to synchronize its state properly from the agreed upon master node before processing any client requests. We provide some skeleton code to do this (the interface is defined in rsm_state_transfer.h).

Your Job

Your job is to turn the lock_server into a RSM service. Our measure of success is surviving failed master and slaves and incorporating re-joined replicas back into a running RSM. For this lab, you'll need to pass tests 9-12 of rsm_tester.pl (as well as making sure all the file system tests from previous labs work).

The tester picks random ports for the lock server replicas, and starts them up. It redirects output to log files, named as lock_server-[master_port]-[my_port].log. The log for the tester is lock_tester-[master_port].log. Here is the output of a successful run of rsm_tester.pl:

% ./rsm_tester.pl 8 9 10 11 12
test8: start 3-process lock service
...
test9: start 3-process rsm, kill first slave while tester is running
...
test10: start 3-process rsm, kill second slave while tester is running
...
test11: start 3-process rsm, kill primary while tester is running
...
test12: start 3-process rsm, kill first slave at break1, continue with 2, add first slave
...
./lock_tester: passed all tests successfully
tests done
%

When debugging, you might want to run the tests individually by just specifying a single test number. You can also specify the same random seed values acrosss run to make rsm_tester.pl choose the same set of random ports. (e.g. ./rsm_tester.pl -s 89362 8) Once your lab works, make sure it is able to pass all (including test 0-7) tests of ./rsm_tester.pl many times in a row as well as the file system tests from previous labs.

Important: That if rsm_tester.pl fails during the middle of a test, the remaining lock_server processes are not killed and the log files are not cleaned up (so you can debug the causes.). Make sure you do 'killall lock_server; rm -f *.log' to clean up the lingering processes before running rsm_tester.pl again.

Detailed Guidance

Step One: RSM without failures

As the first step, just get the RSM working, assuming that none of the replicas will fail. The basic protocol is:

The RSM client sends its request the master.
The master assigns the client request the next viewstamp in sequence, sends the invoke RPC to each slave.
If the request has the expected viewstamp, the slave executes the request locally and replies OK to the master.
The master executes the request locally, and replies back to the client.

This will involve filling in the various functions in rsm.cc mentioned above. In particular:

rsm::client_invoke. This RPC handler is called by a client to send a new RSM request to the master. If this RSM replica is undergoing Paxos view changes (i.e. paxos::is_stable() is false), it should reply with rsm_client_protocol::BUSY to tell the client to try again later. If this RSM replica is not the master, it should reply with the rsm_client_protocol::NOTMASTER status. If it is the master, it first assigns the RPC the next viewstamp number in seqquence, and send an invoke RPC to all slaves in the current view. You should supply a timeout to invoke RPC in case any of the slaves have died (see the paxos::heartbeater method for an example use of a timeout). To execute a RSM request, you need to use the provided method execute() which unmarshalls the RSM representation of a client's RPC and executes it using the registered handler.
The master must ensure that all client requests are executed in order at all slaves. One way to achieve this is for the master to process each request serially in lockstep. Under such a design, the master must hold the rsm_mutex lock across RPC calls to slaves. Once all slaves in the current membership list reply successfully, the master can execute the invoked RPC locally and reply success to the client. If a slave times out, the master instructs its Paxos object to initiate a view change.
rsm::invoke: This RPC handler is invoked on slaves by the master. A slave must ensure the request has the expected sequence number before executing it locally and reply back to the master. It should also ensure that it is indeed a slave under the current stable view; if not, it should reply with an error.
The rsm::stable variable keeps track of whether the current replica has successfully synchronized its state with the current master upon the latest view change. If a node has not finished state synchronization, it should not process any RSM requests. For this first step, we do not yet worry about replica failures nor state synchronization so you should temporarily set this variable to be true in recovery() method.

To change your existing lock server/client to use the RSM objects:

Eliminate any randomness in the lock server if there is any, or at the very least make sure the random number generator in each replica is seeded with the same value.
Modify lock_server to take in a rsm object in its constructor (e.g as a pointer) and save it, so that each server can inquire about its master status using the rsm::amimaster() method. Only the master lock_server should communicate directly with the client. Therefore, you should modify lock_server to check if the current server is the master before communicating with the lock_client(s).
Modify lock_smain.cc to take in two port numbers as command line arguments and create an rsm object. The rsm constructor takes two arguments with the first one being the port number of the first replica ever created and the second one being the port number of the current replica. You must then pass the rsm object to the lock_server object. You should also register all RPC handlers of lock_server with the rsm, instead of with the rpcs object (which is no longer needed).
Modify lock_client to create rsm_client object in its constructor. The lock_client should use the rsm_client object to perform its RPCs, in place of the old rpcc object (no longer needed). THe lock_client sends RPCs as usual with the call method of the rsm_client object. The method will further call rsm_client::invoke with marshalled request.

Upon completion of step one, you should be able to pass './rsm_tester.pl 8'. This test starts three lock_servers one after another, waits for Paxos to reach an agreement before performing tests on the lock service using 'lock_tester'.

Step Two: Cope with Failures and Implement state transfer

In this step, you will handle node failures as well as joins in a running RSM. Upon detecting failure or a new node joining, the underlying Paxos protocol is kicked into action. When Paxos has reached an agreement on the next new view, it calls the rsm object's view_change() to indicate that a new view is formed. When a new view is first formed, the rsm::stable variable is set to false, indicating that this node needs to recover its RSM state before processing any RSM requests again. Recovery is done in a separate recovery_thread in the rsm::recovery() method.

To recover after a view change, each replica must first find the master in the current view. To do so, a replica sends a viewstampreq RPC (defined in the rsm_protocol.h file) to ask each node in the membership list about the latest viewstamp it has executed. The node with the biggest viewstamp is taken as the master (with ties broken by node ids). To complete the recovery task, each replica must also transfer state from the master so that nodes have identical state before processing any RSM requests again. Once recovery is finished, the replica should set its stable variable to be true to allow the processing of RSM requests.

To implement state transfer, first make lock_server into a subclass of rsm_state_transfer interface. Second, implement the marshal_state and unmarshal_state methods for lock_server. Use the yfs RPC marshalling code to turn various internal state into strings and vice versa. For example, if state of your lock server consists of a std::map called locks that mapped lock name (std::string) to a list of clients waiting to grab the lock (std::vector), the code might look roughly as follows:


std::string 
lock_server::marshal_state() {

  // lock any needed mutexes
  marshall rep;
  rep << locks.size();
  std::map< std::string, std::vector >::iterator iter_lock;
  for (iter_lock = locks.begin(); iter_lock != locks.end(); iter_lock++) {
    std::string name = iter_lock->first;
    std::vector vec = locks[name];
    rep << name;
    rep << vec;
  }
  // unlock any mutexes
  return rep.str();

}

void 
lock_server::unmarshal_state(std::string state) {

  // lock any needed mutexes
  unmarshall rep(state);
  unsigned int locks_size;
  rep >> locks_size;
  for (unsigned int i = 0; i < locks_size; i++) {
    std::string name;
    rep >> name;
    std::vector vec;
    rep >> vec;
    locks[name] = vec;
  }
  // unlock any mutexes
}

In the lock_server constructor, call the rsm's set_state_transfer method with this as the argument so that rsm can call lock_server's marshal_state and unmarshal_state function later.

Now you should be able to pass './rsm_tester.pl 9 10 11'. These tests starts three lock_servers and kills or restarts the master or slaves while running the lock_test simultaneously.

Step Three: Complicated failures

In rsm::client_invoke, place the function breakpoint1() after the master has finished invoking RSM request on one slave and before it moves on to issue RSM request to other slaves. In the three server test scenario (test 12), this causes the master to fail after one slave has finished the latest request and the other slave has not seen the latest request yet. If you have implemented recovery correctly, the set of RSM servers in the new view resolve this case correctly and all master/slaves will start executing requests from identical state. Note that since the rsm_client has not heard back from the master in the previous view, it will retry its request in the new view (in rsm_client::invoke()). This might cause your lock_server to execute duplicate requests, but that is OK as long as these requests are idempotent, meaning they can be executed multiple times in a row without affecting correctness.

Next, place the function breakpoint1() in rsm::invoke just after the slave has finished executing a request. In the three server test scenario (test 13), this causes the second slave to fail after it has finished the latest request. Again, if you have implemented recovery correctly, the set of RSM servers in the new view resolve this case correctly and all master/slaves will start executing requests from identical state.

If your RSM works correctly, you should be able to pass './rsm_tester.pl 12 13'.

Challenges

Here are a few things you can do if you finish the lab early and feel like improving your code. These are not required, and there are no associated bonus points, but some of you may find them interesting.

There's no reason that the slaves replicas must receive the RSM requests and execute in series (in lockstep). You may be able to improve performance by executing them in parallel.
Turn the RSM library to log its received requests on disk. This way, even if all lock_servers fail simultaneously, we can recover them later.
Use the RSM library to replicate the extent server.

Handin procedure

Tar your l6/ files with the following commands:

cd ~/yfs/l6
make clean
cd ..
tar czvf yfs-lab6.tgz l6/

Go to submit site to upload yfs-lab6.tgz

For questions or comments, email dss-staff.

-- Lab 6: Replicated State Machine --