In this lab you
will replicate your lock server using the replicated state machine (RSM)
approach (see Schneider's RSM paper for a good,
but non-required, reference. We have also discussed an example replicated state machine
in the Lecture.) In the replicated state machine approach, one machine is the
master and the others are slaves. The master is in charge of receiving requests
from clients and executing them on all replicas. To ensure that all replicas
have identical state, the replicas must execute all requests in the same order
and all requests must produce the same result on all replicas (i.e., the
handlers must be deterministic). The RSM uses the Paxos
protocol implemented in the previous lab to agree on the current master and
node membership to cope with failed and re-joined replicas.
To ensure all
requests are executed in a unique total order, the master assigns each request
a viewstamp number which dictates the
total order. The viewstamp consists of two fields, the view number (obtained from Paxos) and a monotonically increasing sequence number. The viewstamps assigned to all RSM requests dictate a total
order among them. In particular, viewstamps with a
lower view number are ordered before those with a higher view number. Within
the same view number, viewstamps with lower seqnos are ordered before those with higher seqnos. How do we guarantee all viewstamps
form a unique total order? This is because Paxos
guarantees all view numbers form a total order. Additionally, within each view,
all nodes agree on the current view's membership and thus each RSM node can use
the agreed upon membership to agree on a unique master who is the only one that
can assign each request an increasing seqno to
properly order requests within a view.
The primary task in
the lab is building a RSM library on top of our existing RPC library so that
you can plug in any RPC program you have written so far and replicate it. To
ensure the appropriate behavior, however, there are certain constraints on the
RPC handlers. Most importantly, the RPC handlers must run to completion without
blocking, and be deterministic and idempotent. These constraints will ensure
that all replicas execute all requests in the same order and with the same
result. Once you have built the RSM library we will ask you to replicate the
lock server you built in previous labs using RSM. In this lab it should become
clear why we asked you to write the lock server in a way so that there are no
blocking RPC handlers, and why we asked you to include sequence numbers.
Begin by updating
your lab directory with the new infrastructure code for lab 8. Since you
are building on the past labs, make sure your code passes all tests for
previous labs before starting in on this lab.
% cd ~/lab
% git commit -am 'my solution to lab7'
Created commit ...
% git pull
remote: Generating pack...
...
% git checkout -b lab8 origin/lab8
Branch lab8 set up to track remote branch refs/remotes/origin/lab8.
Switched to a new branch "lab8"
% git merge lab7
We provide you with some
skeleton code of the RSM library. The library has the client and server class
for RSM in files rsm_client.{cc,h} and rsm.{cc,h}.
The RSM client class (rsm_client) is used by a client
program to request service from the master replica in the RSM. The RSM client
takes in its constructor the address of a known replica, and immediately contacts
the node to learn the addresses of all replicas as well as the current master.
The client program (e.g. the lock_client class) can use the call a method on the RSM client object (just as if
it were an RPC client). The call method on RSM client will marshall
RSM request and send it via the rsm_client_protocol::invoke RPC to the master replica.
(The RPC protocol between the RSM client and RSM server (replica) is defined in
the rsm_client_protocol class in file rsm_protocol.h).
To turn any server program
into a replica in the RSM service, your application (e.g. lock_server_cache class) creates an RSM
server object (rsm) and use it in place of the normal RPC server rpcs object. The RSM server
constructor creates a config object with arguments consisting the id of the first replica ever
created and the id of this server. The RSM server registers a number of RPC
handlers and spawns off a recovery thread to synchronize state with the master
replica when Paxos has agreed on a stable view.
Once the master is in a
stable state, it can process invoke RPCs from RSM clients. For each request, the master assigns it
the next viewstamp number with an increasing seqno. The
master then issues an invoke RPC on all replicas in the current view. The
replicas unmarshall the request, and execute the
registered handler. Note that the replicas must execute requests in the same
total order as dictated by the requests' viewstamps without
any gaps in seqno. If the master has succeeded in
executing a request on all replicas (including itself), it will reply to
the client. If the master has encountered replica failures during this process,
it should instruct its config object to inititate a view change.
Occasionally, an RSM client might send its request to a non-master node, in
which case the node should reject the client's request by replying with rsm_client_protocol::NOTPRIMARY. The client will then call
the members RPC to get an updated list
of replicas.
When a failed replica
re-joins a running RSM, it has potentially missed many requests and must do a
state transfer to bring its state in sync with the other replicas before it can
process any requests. Additionally, when the master has encountered a failure
during the process of invoking the client request at various replicas, some
replicas might have executed the request while others not. Thus, the RSM
servers must be able to synchronize its state properly from the agreed upon
master node before processing any client requests. We provide some skeleton code
to do this; the interface is defined in rsm_state_transfer.h.
Your job is to turn the lock_server_cache into a RSM service. Our
measure of success is surviving failed master and slaves and incorporating
re-joined replicas back into a running RSM. For this lab, you'll need to pass
tests 8-13 of rsm_tester.pl (as well as making sure all
the file system tests from previous labs work).
The tester picks random
ports for the lock server replicas, and starts them up. It redirects output to
log files, named as lock_server-[master_port]-[my_port].log. The log for the tester is lock_tester-[master_port].log. Here is the output of a
successful run of rsm_tester.pl:
% ./rsm_tester.pl 8 9 10 11 12 13
test8: start 3-process lock service
...
test9: start 3-process rsm, kill first slave while tester is running
...
test10: start 3-process rsm, kill second slave while tester is running
...
test11: start 3-process rsm, kill primary while tester is running
...
test12: start 3-process rsm, kill first slave at break1, continue with 2, add first slave
...
test13: start 3-process rsm, kill slave at break1 and restart it while lock_tester is running
...
./lock_tester: passed all tests successfully
tests done
%
When debugging, you might want to run the tests individually by
just specifying a single test number. You can also specify the same random seed
values across run to make rsm_tester.pl choose the same set of random ports. (e.g. ./rsm_tester.pl -s 89362 8) Once your lab works, make sure it is able to pass all (including
test 0-7) tests of ./rsm_tester.pl many times in a row as well as the file system tests from
previous labs.
Important: As in the previous lab, if
rsm_tester.pl fails during the middle of
a test, the remaining lock_server processes are not
killed and the log files are not cleaned up (so you can debug the causes.).
Make sure you do '
As the first step, just get the RSM working, assuming that none of
the replicas will fail. The basic protocol is:
This will involve filling in the various functions in rsm.cc
mentioned above. In particular:
·
rsm::client_invoke(). This RPC handler is called by a client to send a new RSM request
to the master. If this RSM replica is undergoing Paxos
view changes (i.e. changing is true), it should reply
with rsm_client_protocol::BUSY to tell the client to try again later. If this
RSM replica is not the master, it should reply with the rsm_client_protocol::NOTPRIMARY status. If it is the
master, it first assigns the RPC the next viewstamp
number in seqquence, and send an rsm_protocol::invoke RPC to all slaves in the
current view. As in the previous lab, you should supply a timeout to the invoke RPC in case any of the slaves have died. To
execute a RSM request, you need to use the provided method execute() which unmarshalls
the RSM representation of a client's RPC and executes it using the registered
handler.
The master must ensure that
all client requests are executed in order at all slaves. One way to achieve this
is for the master to process each request serially in lockstep. An easy way to
ensure that requests are processed one at a time is to hold a mutex in client_invoke() while a request is being processed. However,
it would be a bad idea to hold rsm_mutex while calling the invoke RPC, since the rsm_mutex protects the internal data
structures of the RSM as well, and nothing else can happen in the RSM while
that mutex is held. Instead, you can hold a separate mutex called invoke_mutex, which is used only to
serialize calls to client_invoke(). You need to be careful of two things if you
serialize requests this way. First, you shouldn't hold rsm_mutex across RPCs. Second, you
shouldn't try to acquire invoke_mutex while you are holding rsm_mutex, since that would
effectively cause you to hold rsm_mutex for the duration of an RPC.
Once all slaves in the
current membership list reply successfully, the master can execute the invoked
RPC locally and reply success to the client. If a slave times out, the master
instructs its Paxos object to initiate a view change.
·
rsm::invoke(): This RPC handler is
invoked on slaves by the master. A slave must ensure the request has the
expected sequence number before executing it locally and reply back to the
master. It should also ensure that it is indeed a slave under the current
stable view; if not, it should reply with an error.
·
The rsm::inviewchange variable keeps track of
whether the current replica has successfully synchronized its state with the
current master upon the latest view change. If a node has not finished state
synchronization, it should not process any RSM requests. For this first step,
we do not yet worry about replica failures nor state synchronization so you
should temporarily set this variable to be true in the recovery() method.
To change your existing lock server/client to use the RSM objects:
·
Eliminate any randomness in the lock server if there is any, or at
the very least make sure the random number generator in each replica is seeded
with the same value.
·
Modify lock_server_cache to take in a rsm object in its constructor
(e.g., as a pointer) and save it, so that each server can inquire about its
master status using the rsm::amiprimary() method. Only the master lock_server_cache should communicate directly
with the client. Therefore, you should modify lock_server_cache to check if it is the
master before communicating with the lock client(s).
·
Modify lock_smain.cc to instantiate the lock
server and pass the rsm object to the server's constructor. (You'll need to remove the ifdef we added in lab 7.) You
should also register all RPC handlers of lock_server_cache with the rsm, instead of with the rpcs object (which is no longer
needed).
·
Modify lock_client_cache to create rsm_client object in its constructor. The lock_client_cache
should use the rsm_client object to perform its RPCs, in place of the old rpcc object (no longer needed).
The lock_client_cache sends RPCs as usual with the call method of the rsm_client object. The method will
further call rsm_client::invoke with marshalled request.
Upon completion of step one,
you should be able to pass './rsm_tester.pl 8'. This test starts three lock_servers one after another, waits
for Paxos to reach an agreement, then performs tests
on the lock service using lock_tester.
In this step, you will
handle node failures as well as joins in a running RSM. Upon detecting failure
or a new node joining, the underlying Paxos protocol
is kicked into action. When Paxos has reached an
agreement on the next new view, it calls the rsm
object's commit_change() to indicate that a new view is formed. When a
new view is first formed, the rsm::inviewchange variable is set to true,
indicating that this node needs to recover its RSM state before processing any
RSM requests again. Recovery is done in a separate recoverythread in the rsm::recovery() method.
After a view change, each
replica should recover by transferring state from the master. Its state must be
identical to the master's before processing any RSM requests in the new view.
Once recovery is finished, the replica should set its rsm::inviewchange variable to false to allow
the processing of RSM requests. The master should not send any requests to the
backups until all the backups have recovered.
To implement state transfer,
first make lock_server_cache into a subclass of rsm_state_transfer interface. Second,
implement the marshal_state and unmarshal_state methods for lock_server_cache. Use the YFS RPC marshalling code to turn
various internal state into strings and vice versa. For example, if state of
your lock server consists of a std::map called locks that mapped lock name (std::string) to a list of clients
waiting to grab the lock (std::vector), the code might look roughly as follows:
std::string
lock_server_cache::marshal_state() {
// lock any needed mutexes
marshall rep;
rep << locks.size();
std::map< std::string, std::vector >::iterator iter_lock;
for (iter_lock = locks.begin(); iter_lock != locks.end(); iter_lock++) {
std::string name = iter_lock->first;
std::vector vec = locks[name];
rep << name;
rep << vec;
}
// unlock any mutexes
return rep.str();
}
void
lock_server_cache::unmarshal_state(std::string state) {
// lock any needed mutexes
unmarshall rep(state);
unsigned int locks_size;
rep >> locks_size;
for (unsigned int i = 0; i < locks_size; i++) {
std::string name;
rep >> name;
std::vector vec;
rep >> vec;
locks[name] = vec;
}
// unlock any mutexes
}
In the lock_server_cache constructor, call the rsm's set_state_transfer method with this as the argument so that rsm can call lock_server_cache's marshal_state and unmarshal_state function later.
Now you should be able to
pass '
The rsm_client::invoke() method handles two special cases. First, if the replica that the
client sends the invoke RPC to is no longer the primary, that replica returns rsm_client_protocol::NOTPRIMARY. In this case, the client calls init_members(),
which sends a members RPC to the old primary to update its list of replicas. Then the
client retries its request.
The second case is where the
replica isn't responding at all (so the invoke RPC fails). In this case, init_members() won't work because the members RPC will also fail. In this case, the client
calls rsm_client::primary_failure(), which should make it
forget about the failed replica and ask a different replica for an updated list
of members. Then invoke() should retry as before.
We have given you most of
the code you need. Your job is simply to write rsm_client::primary_failure() to handle the second case.
If you did not use sequence
numbers when implementing your caching lock server, you will need to add them
now. In lab 5, the sequence numbers in the RPC layer were sufficient to avoid
duplicate requests. However, in this lab, consider what happens if a replica
crashes while some replicas have executed the request and others have not. A
view change will occur, and the client will re-send the request in the new
view. If the new primary has already executed the request, the RPC handler in lock_server_cache will be invoked twice. Note
that if the primary crashes while (or shortly after) executing an acquire or
release RPC, after recovery it will
be ambiguous as to whether the appropriate retry or revoke RPCs were sent in the
previous view.
A simple way to address this
is to have clients that are waiting to acquire locks retry automatically every
3 seconds, even in the absence of a retry RPC. The servers can use sequence numbers (see Lab 5 for details) to identify duplicate acquire requests; however, when a server gets a
duplicate acquire and another client holds
the lock, it should send another revoke anyway, in case the first revoke got lost due to a crash. If you designed your protocol as we
suggested in Lab 5, it should already behave correctly
if there are duplicate retry and revoke RPCs.
Now you should be able to
pass '
In rsm::client_invoke, place the function breakpoint1() after the master has
finished invoking RSM request on one slave and before it moves on to issue RSM
request to other slaves. In the three server test scenario (test 12), this
causes the master to fail after one slave has finished the latest request and
the other slave has not seen the latest request yet. If you have implemented recovery
correctly, the set of RSM servers in the new view resolve this case correctly
and all master/slaves will start executing requests from identical state. Note
that since the rsm_client has not heard back from the master in the previous view, it will
retry its request in the new view (in rsm_client::invoke()). This might cause your
lock server to execute duplicate requests, but that is OK as long as these
requests are idempotent, meaning they can be executed multiple times in a row
without affecting correctness.
Next, place the function breakpoint1() in rsm::invoke just after the slave has
finished executing a request. In the three server test scenario (test 13), this
causes the second slave to fail after it has finished the latest request.
Again, if you have implemented recovery correctly, the set of RSM servers in
the new view resolve this case correctly and all master/slaves will start
executing requests from identical state.
If your RSM works correctly,
you should be able to pass './rsm_tester.pl 12 13'.
Here are a few things you can do if you finish the lab early and
feel like improving your code. These are not required, and there are no
associated bonus points, but some of you may find them interesting.
You will need to email your completed code (without binaries) as a
gzipped tar file to ds-assignment@mpi-sws.org by the
deadline stated at the top of the page. To do this, switch to the source
directory and execute these commands:
% tar czvf MATR1-MATR2-lab8.tgz lab/
That should produce a file called MATR1-MATR2-lab8.tgz in that
directory, where MATR1 and MATR2 are the matriculation numbers of the team
members. Attach that file to an email and send it to the address above with the
subject "Assignment8 - LastName1 LastName2".
You will receive full credit
if your software passes the same tests we gave you when we run your software on
our machines.
Questions or comments regarding this
course? Please use the general
course mailing list or the teaching
staff mailing list.
Top // Distributed
Systems //