Lecture 6: Vector Timestamps

Topic: Consistency with Timestamps / Version Vectors

Consistency so far
  Consistency problems happen when caching and concurrent writers
  Different consistency models
    sequential --- IVY (owner)
    release consistency --- TreadMarks
  All these schemes are based on pessimistic concurrency control:
    before you make a change, make sure you are allowed (ask for permission)
      IVY: DSM scheme gets ownership
      TreadMarks: put locks+release in the right places

Nice semantics, but sometimes awkward
  Requires lots of chatter between all participating machines
  What if a client is disconnected?

Lets take a step back:  CAP Theorem
  - cannot have all three of (C)onsistency, (A)vailability, (N)etwork partitions
  - BUT: can have any two of C,A,P
  - C+A: all nodes must be online all the time (e.g. IVY): can't have P
  - C+P: have to stop serving requests when someone is off-line: give up A
  - A+P: give up consistency when some host is off-line (see below)

Can you sensibly operate out of your cache
  without seeing other people's updates?

Scenario 1: disconnected operation with laptops
  source-code repository, cooperating programmers
  each programmer's laptop has a complete copy of the files
    want to be able to modify files while disconnected
  programmers synchronize files pair-wise: "file synchronizer"
  how to decide which files to copy, and in which direction?
    non-conflicting changes: take latest version
    conflicting changes: report an error, ask humans to resolve
  "optimistic concurrency"  --- don't ask for permission; apologize later
  SVN, Git (and many of the other alternatives) solves this problem

Scenario 2:
  single user has many machines, each with a copy of the user's file system
  user works on one machine for a while (perhaps disconnected)
  user works on second machine for a while
     perhaps forgot to sync machine 1 with machine 2 before starting to work
  then on a third machine
     syncing first machines 2 and 3.
  back on to the first machine.
    want to ensure that the first machine has all the changes from 2 and 3

Example 1:
  Just focus on modifications to a single file "f" (time flows left to right)
  H1: w(f)1 ->H2       ->H3
  H2:            w(f)2
  H3:                       ->H2
  What is the right thing to do for H2?
  Is it enough to simply take file with latest modification time?
  Yes in this case, as long as you carry them along correctly:
    - H3 remembers mtime assigned by H1, not mtime of sync.
    - H2 makes sure its local time is at least equal to H1's local time when they sync

Example 2:
  H1: w(f)1 ->H2 w(f)2
  H2:                  w(f)0 ->H1
  H2's mtime will be bigger.
  Should the file synchronizer use "0" and discard "2"?
    No! They were conflicting changes. We need to detect this case.
    Modification times are not enough by themselves

What is the principle here?
  We're not *enforcing* a sequential update history.
    No locks, no Ivy-style owners.
  But *if* updates were actually sequential,
    we want to detect it and carry along most recent copy.
    This is where the "optimism" comes in.
  And if updates were not sequential,
    we want to detect the problem (after the fact).

So what is the problem with using wall-clock time?
  Certainly if T1 < T2, T1 does not supersede T2.
  *But* if T2 > T1, that doesn't imply that T2 supersedes T1.
  Ordered in time doesn't mean everyone agrees on the order.
  The underlying problem is that time doesn't reflect causality.
  That is, we care if the write at T2 was on a host that had seen write at T1.

How can we decide whether we're looking at a sequential history?
  We could record each file's entire modification history.
  List of hostname/localtime pairs.
  And carry history along when synchronizing between hosts.
  For example 1:   H1: H1/T1   H2: H1/T1,H2/T2   H3: H1/T1
  For example 2:   H1: H1/T1,H1/T2   H2: H1/T1,H2/T3
  Then its easy to decide if version X supersedes version Y:
    If Y's history is a prefix of X's history.
    This exactly captures our desire for an *ordered* sequence of updates.
  Note we're not comparing times, so this works even with out-of-sync host clocks.

Why is complete history not such a great solution?
  The histories grow without bound if we modify a file a lot.

Can we compress the histories?
  Proposal: Vector Timestamps.
  Summarize a history w/ each host's highest time.
  So throw away all but the last history record for each host.
  Last TS from each host preserves information essential to
    detecting conflicting updates:
    Had I seen your latest change when I made my change?
  Only care about latest change.

A vector timestamp is a vector of times, one entry per host.
  The times are (as in the histories) local wall-clock times.
    Though they could be any monotonic count (e.g. local version number).
  If a and b are vector time-stamps:

  a = b if they agree in every element.
  a < b if a[i] <= b[i] for every i and !(a = b)
  a || b ("a and b conflict") if a[i] < b[i], a[j] > b[j], for some i,j.

  a < b => !(b < a) (antisymmetrie)
  a < b AND b < c => a < c (transitivity)

If one history was a prefix of the other, then one vector timestamp
  will be less than the other.

If one history is not a prefix of the other, then (at least by example)
  VTs will not be comparable.

So now we can perform file synchronization.
  Laptops carry along a version vector with each file, rather
    than just the single timestamp of last change.
  When a pair of laptops syncs a file, they accept the file with
    the higher vector timestamp.
  And signal an error if the VTs conflict.
  Illustrate with the two examples.
    Ex1. H1: <1,0,0>  H3: <1,0,0> H2: <1,1,0>  H3: take update from H2
    Ex2. H1: <1,0,0> H2: <1,0,0>  H1: <2,0,0> H2: <1,1,0>  conflict w. H1

What if there *are* conflicting updates?
  VTs can detect them, but then what?
  Depends on the application.
  Easy: mailbox file with distinct messages, just merge.
  Medium: changes to different lines of a C source file (merge and hope for the best).
  Hard: changes to the same line of C source.
  Reconciliation must be done manually for the hard cases.
  SVN keeps extra information, equivalent to complete history.
    So it can generate diffs from common ancestor.
    Much better than diffing the two final versions against each other.
      Which is newer? Or did they both change?

What about file deletion?
  Deletes have to have VTs so you can detect delete/write conflicts.
  Could treat as a special kind of write.
  But then you have to remember deletion timestamp indefinitely.
    In case some host out there hasn't heard about it.
  Can we garbage collect deletes?

Let's step back.
  What are vector timestamps really doing?
  They capture inter-system causality:
    What information had I heard from each other system?
  They are helpful when we care about "causal consistency":
    "Writes that are causally related must be seen by all processes
     in the same order."

How is causal consistency related to our file sync example?
  In general, each version of a file depends on the previous version.
    I.e. won't make sense to observer who didn't agree on previous version.
  Example: each update was incrementing a number in the file.
    Each version helps cause the next version.
  If updates aren't ordered, causal relationships won't make sense.
  So we've used VTs to *detect* violations of causal consistency.

We can also use VTs to help *ensure* causal consistency.
  Suppose we're all working on a s/w project, with our file sync.
  We have two files: math.c and app.c
  Alice adds sqrt() to math.c
  Bob syncs with Alice, gets new math.c
  Bob looks at math.c, changes app.c to call sqrt().
  Bob syncs with Charles
    Bob thinks "I've only changed app.c"
  Charles won't be able to compile without the new math.c
    i.e. app.c depends on math.c
  We could fix this with a second application of vector timestamps:
    Tag each file with a second "dependency" VT.
    D-VT summarizes information available at time of last update to file.
      I.e. a summary of history of inputs to file.
    Each entry in vector is MAX known time from each host,
      as of when the file was updated.
    Then sync should xfer *all* files in D-VT order.
    Thus Bob will send math.c to Charles before app.c

TreadMarks uses a similar technique...


-------

Optimistic Replication Using Vector Time Pairs
Russ Cox and William Josephson

what's the overall goal?
  flexible file sync

what are the implied problems w/ existing file synchronizers?
  they need to garbage collect file deletion notices
    can be held up by dead replicas
  must look at every file in order to sync
  VT per file requires a lot of storage
  cannot record conflict resolutions
  unison limited to pair-wise synchronization

what are the claimed novel properties?
  simplicity, flexibility
    sync in any order
  easier to compress (section 3.2)
    in particular, singleton mod times!
  no explicit adding or removing of hosts
  can quickly find data that needs to be transferred (section 3.1)

how did previous systems work?
  one vector timestamp per file, reflects modifications
  synchronize a disk: compare every file's VT
    need to transfer every VT, and store them
  need to keep VT for deleted file, treat deletion as modification
    otherwise it will re-appear on next sync
  GC deleted file's VT after everyone has seen it
    so maintain, for every other node, highest VT it is known to know
    really min VT for which it knows every VT before that
    gossip. one slow node can prevent GC.

what's the key technique?
  two vector times: mod time and sync time
  what you have vs what you know
  what does this mean?

Intuition?
  sync time implies times at which modifications *didn't* occur?

in what sense does a single VT combine mod and sync times? (3.5.1)
  most files have same sync times

any advantages for a single file?
  maybe not, and we can optimize sync times away for files.

advantages for syncing a whole directory tree?
  dir sync time = elementwise min of child sync times
  dir mod time = elementwise max of child mod times
  can skip whole dir if src dir's mod time <= dst dir's sync time
  why does this make sense?
  higher sync time proves you had (maybe indirectly) synced
    w/ relevant node, he was required to tell you if he changed anything.

does dir sync/mod sound like an important optimization?
  why couldn't we do this w/ single VT?
  skip if src dir's mod time <= min(dst tree's mod times)?
  skip if src dir's mod time <= max(dst tree's mod times)?

what if I sync just a sub-tree
  will that mess up optimized syncs of parent?
  won't mess up sync time, since min
    only changes parent sync time if this subdir was laggard
  hmm, may increase mod time. but that's conservative,
    will cause syncer to consider this directory

why can they get away with singleton file mod times?
  i.e. suppose we have A:1, B sees it, then B:2
  how does A know B:2 supersedes A:1?
  B's sync time tells it!
  so haven't we just moved the complexity to the sync time?
  no: need all elements of sync VT, but all files usually have same sync time

how do they garbage collect deletion notices?
  dir sync time proves no file existed as of that time!!!
  so if during sync other guy sends earlier create time, he should delete.
  and if he has later create time, you must create.
  if his file mod time > your dir sync time, conflict
    you can tell because create < dirsync and you don't have the file...

it's a disk file system: how do they detect user's changes?