Lecture 2: RPC

Remote Procedure Calls (RPC)

  a stylized version of client/server communication that attempts to
  make remote procedure calls look like ordinary procedure calls.
 
  RPC ideally makes net communication look just like a function call:
    Client:
      z = fn(x, y)
    Server:
      fn(x, y) {
        compute
        return z
      }
    RPC aims for this level of transparency
    Hope: even novice programmers can use function call!

  PC message diagram:
    Client             Server
      request--->
         <---response

  RPC software structure:

  Client  Client     RPC       Net      RPC             Server      Handler
   App     Stub      lib                lib              Stub

  call    marshall   send     request   recv  dispatch  unmarshall  fn(args)
                     wait               wait                        work...
  return  unmarshall recv     response  send            marshall    return

  Also:
    access control
    argument validation
    if network is not secure: authentication, confidentiality, integrity

  key properties:
    easy to write programs with
    model programmers are familiar with
    good match for many distributed applications (client/server)
    hides details (e.g., marshaling/unmarshaling, automatically generated stubs)
    also works for object-oriented languages (remote object invocation)
    BUT: Failures can't be masked!

  alternatives?
    directly programming with sockets
    MPI (Message Passing Interface)
    distributed shared memory
    map/reduce
    dryad
    ...

  RPC is widely used
    SOAP
    ONC
    .NET
    Java RMI
    support found in modern programming languages (Go, Rust, ...)

A few details:
  Marshalling is tricky for arrays, pointers, objects, etc.
  Client needs to find server's network address ("binding")
  Client may have multiple threads sending to server
    match response to waiting thread
  Server may create thread per request

Hard challenge: failures
  network may drop, delay, duplicate, re-order messages
  network might break altogether, and recover later
  server might crash, and maybe re-start
       did server fail just before the processing the request or just after?
   often impossible to tell the difference between communication failures and
     machine failures    
  can we hide this from client applications and server handlers?

Birrell RPC paper
  paper's main concerns:
    naming
    minimize # of packets (slow CPUs -> slow pkt handling) (we use TCP)
    failures

Naming RPC servers
  Uses Grapevine, a name service (a little like DNS)
  Export(service name, server host)
  Import(service name) -> server host
  level of indirection:
    clients need not hard-code server names
    multiple servers (use closest)
    replacement of servers

Let's talk about how a client can handle failure
  client sends a request
  suppose network discards the request packet
  what will client observe?
  what should the client do?
  how long should client wait before retransmission?

Now suppose the network delivered request, but discarded response
  what will client observe?
  what should the client do?

Would it be enough to layer RPC on top of TCP?
  After all, TCP is a reliable transport protocol
  No, TCP connection can still break

Possible RPC semantics:

  a) At-least-once  semantics
    Implementation: client re-transmits request until response arrives,
    		    raises exception if too many unsuccesful retransmissions
		    server keeps no state about past invocations
    RPC completes OK -> executed once or several times
    RPC raises exception -> executed zero, one, or multiple times, unknown which

    Q: is "at least once" easy for applications to cope with?
    Example 1: remote procedure does "deduct $10 from bank account"

    Example 2: 
      Put("k", 10) -- an RPC to set key's value in a DB server
      Put("k", 20) -- client then does a 2nd Put to same key
      [diagram, timeout, re-send, original arrives very late]

     Q: is at-least-once ever OK?
     yes: if it's OK to repeat operations, e.g. read-only op
     yes: if application has its own plan for coping with duplicates


  b) At-most once semantics (RPC paper)
    RPC completes OK -> executed once
    RPC raises exception -> zero or one times, unknown which
   
    How does the paper achieve at-most-once?

    1) client assigns a unique id for each RPC call
      how? e.g., epoch id (e.g., mtime of last reboot) + monotonic sequence #

    2) if server has already executed the call, return exception
       this is easy if the server doesn't crash, here's how:
         server maintains s[xid] (state), r[xid] (return value)
	 (this code assumes just one server thread)
	 if s[xid] == DONE:
	    return r[xid]
	 x = handler(args)
	 s[xid] = DONE
	 r[xid] = x
	 return x

	 What if the server crashes?
  	   Just before or just after call to handler()
  	   After server restart:
    	     Can't tell whether handler() was called
    	     Client will re-send request (no response due to crash)
    	     Server *must* reply with exception!

	 How to decide if a request is a re-sent copy of one sent before crash?

	 Answer:
	   server has a number that uniquely identifies epochs (periods between successive reboots)
             Birrell calls it the ID, we call it the server nonce
	   client obtains server's ID when it first connects during "bind"
	   client sends server ID in every RPC request
	   server checks whether ID in request == current ID
	       if equal, then any previous transmission will be in server's replies[] table
	       if not equal, return exception

	  What should client do if server raises exception?
	    might have been executed already, might not have been
	    thus you know:
	        maybe you transferred the money, maybe you didn't
		but you know it didn't transfer twice
		and you know you have to track down what actually happened
		      e.g. ask bank for transaction record
	     this situation is pretty rare
	     more on this later in the course

	  How to ensure server never reuses an ID?
 	    server could store ID on disk (if it has a disk)
	    or use boot time (if it has access to a clock)
	    or use a big random number (if it has a source of randomness)

	  When can server discard old saved return values?
	    after e.g. five seconds? no!
	    server can discard if client will never retransmit
	        == if client has received response
	    have client tell server which replies it has received
	    or: client tells server it has received all replies up through some XID
	        lab RPC's xid_rep, in every request message


  c) Exactly once semantics (rarely implemented)
    Like at-most-once, plus fault-tolerant remote procedure


In all cases:
   Client can't wait for the response forever; it if times out, it raises an exception.
   In this case, client app has no idea whether the remote procedure was executed.


At-least-once versus at-most-once?

  let's take an example: acquiring a lock

  at-least-once semantics:
    retransmitted acquire request may result in failure ("lock is already taken")
    worse, delayed request message may re-take the lock after it was released (-> deadlock)
    lock application needs its own way to deal with duplicates, no help from RPC semantics
    
  with at-most-once semantics:

  if client and server stay up and no message loss, client receives lock
    if client fails, it may know it has the lock or not 
    if server fails, client may have lock or not
    what does a client do in the case of an exception?
      need to implement some application-specific protocol
        ask server, do i have the lock?
	server needs to have a plan for remembering state across reboots 
        e.g., store locks and their holders on disk.

    what if a client fails while holding a lock?
      -> leases (later in this course)
      nicely handles client and server failures


=> Need to handle failures makes RPC very different from procedure calls


YFS RPC versus RPC in paper
  Both at-most-once
    Using the same technique (bind and exchange a nonce)
  Protocols differ
    YFS runs on reliable transport


Lab RPC request message format
  (TCP/IP headers...)
  xid
  proc#
  client nonce
  server nonce
  xid_rep
  arguments...

Lab RPC response message format
  (TCP/IP headers...)
  xid
  error code
  return value

Example 1: ordinary calls
  bind req: xid=1 sn=? proc=1
  bind reply: xid=1 sn=22
  ...
  req: xid=6 sn=22 xid_rep=5 proc=2 args...
    r[6] = r1
    server deletes xid<=5 from s[]/r[]
  reply: xid=6 r1
  req: xid=7 sn=22 xid_rep=6 proc=2 args...
  reply: xid=7

Why not let xid_rep be implicitly xid - 1 ?
  to allow multiple outstanding RPCs from one client
  example: xid=7 sent before reply for xid=6 arrives

Client has to compute xid_rep
  it is highest xid through which all replies have been received
  our code keeps set of replied-to xids, looks for contiguous prefix

Example 2: duplicate requests (client resends too soon, or response lost)
  req: xid=8 ...
  req: xid=8 ... (again)
  if handler has finished when 2nd req arrives:
    reply with r[8]
    thus two replies with xid=8
    client will ignore 2nd reply, no thread waiting for that xid
  if handler has not finished:
    ignore request (thus need more than just DONE -- INPROGRESS)

Example 3: server reboot
  req: xid=9 sn=22 ...
  server crash, reboot, new sn=23
  req: xid=9 sn=22 ... (retransmission)
  server sees 22 != 23, replies FORGOTTEN


Code roadmap:

lock_demo.cc
  creates a lock_client, tells it where to connect

lock_client::lock_client
  creates rpcc, tells it where to connect
  calls bind() to get server_nonce

lock_client::stat
  calls call(), proc #, arguments, &return

rpcc::call1
  msg: xid proc cn sn xid_rep args...
  xid_rep_window_.front()?
  update_xid_rep(xid)?

rpcc::got_pdu

rpcs::rpcs

rpcs::dispatch
  why the nonce check? what prob does this solve?
  when could INPROGRESS occur?
  is FORGOTTEN possible? how?
    long delayed request
    (rebooted server is dealt w/ separately)
  what is the client nonce for?

checkduplicate_and_update(cn, xid, xid_rep, &b, &sz)
  must keep, for each cn, state about each xid.
    done? b+sz if done.
  if s[cn/xid] == INPROGRESS
    INPROGRESS
  if s[cn/xid] == DONE
    DONE, return b and sz
  if xid <= previous xid_rep
    FORGOTTEN
  else
    s[cn/xid] = INPROGRESS
    NEW

  must also trim state
    discard any cn/xid if xid <= xid_rep
    must also free buf

what must add_reply(cn, xid, b, sz) do?
  checkduplicate_and_update already set s[cn/xid] = INPROGRESS
  s[cn/xid] = DONE
  remember b and sz.