Lecture 5: Release consistency Today: Lazy release consistency TreadMarks Contrast to IVY and sequential consistency Review: what makes a good consistency model? Model is a contract between memory system and programmer Programmer follows some rules about reads and writes Model provides guarantees Model embodies a tradeoff Intuitive for programmer vs. Can be implemented efficiently Treadmarks high level goals? Better DSM performance. Run existing parallel code. What specific problems with previous DSM are they trying to fix? false sharing: two machines use diff vars on same page, at least one writes IVY => page bounces back and forth but doesn't need to if different vars send only written bytes -- not whole pages Big idea: write diffs goal: don't send whole page, just written bytes Example: a page is read-share among M1 and M2 on M1 write fault: tell M2 to invalidate and keep hidden copy M1 makes hidden copy of current page as well on M2 fault: M2 asks M1 for recent modifications M1 "diffs" current page against its hidden copy M1 send diffs to M2 M2 applies diffs to its hidden copy Next goal: allow multiple readers+writers to cope with false sharing => no invalidation when a machine writes => no r/w -> r/o demotion when a machines reads => multiple *different* copies of a page! which should a reader look at? diffs help: can merge writes to same page but when to send the diffs? no invalidations -> no page faults -> what triggers sending diffs? Big idea: release consistency Insight: In a properly synchronized program, no-one should read data w/o holding a lock! to support locks, we need a lock server, much like in the lab Assumption: program is synchronized using locks now, send out write diffs on release, to all copies of pages written we do this for every page on every lock release, because we don't know which lock protects which page(s) this is a new consistency model! M0 won't see M1's writes until M1 releases a lock so machines can temporarily disagree on memory contents if you always lock: locks force order -> no stale reads -> like sequential consistency if you don't lock: reads can return stale data concurrent writes to same var -> trouble benefit? multiple machines can have copies of a page, even when 1 or more writes => no bouncing of pages due to false sharing => read copies can co-exist with writers relies on write diffs otherwise can't reconcile concurrent writes to same page Big idea: lazy release consistency instead of sending diffs to all copies upon lock release - fetch write diffs upon acquire of a lock - and only fetch from previous holder of that lock i.e., nothing happens at time of write or release LRC is yet another new consistency model! LRC hides some writes that RC reveals (example below) benefit? if you don't acquire a lock, you don't have to fetch/receive updates => if you use just some vars on a page, no need to fetch/receive writes to others => less network traffic Example 1 (false sharing) x and y are on the same page. M0: a1 for(...) x++ r1 M1: a2 for(...) y++ a1 print x, y r1 r2 What does IVY do? What does Treadmarks do? M0 and M1 both get cached writeable copy of the page when they release, each computes diffs against original page M1's a1 causes it to pull write diffs from last holder of lock 1 so M1 updates x in its page Example 2 (LRC) x and y on same page (otherwise IVY avoids copy too) M0: a1 x=1 r1 M1: a2 y=1 r2 M2 a1 print x r1 What does IVY do? What does Treadmarks do? M2 only asks previous holder of lock 1 for write diffs M2 does not see M1's modification to y, even tho on same page Q: is LRC a win over IVY if each variable on a separate page? (no) Q: why is LRC a reasonably intuitive model for programmers? same as sequential consistency if you always lock but non-locking code, like v=f(); done=1; does not work Example 3 (motivate vector timestamps) x and y are on different pages M0: a1 x=1 r1 M1: a1 a2 y=x r2 r1 M2: a1 a2 print x, y r2 r1 Should M1 forward to M2 the value of y only? Or the value of x and y? we need to define what LRC guarantees! Answer: when you acquire a lock, you see all writes by previous holder and all writes previous holder saw What does TreadMarks do? M2 and M1 need to decide what M2 needs and doesn't already have uses "vector timestamps" each machine numbers its releases (i.e. write diffs) M1 tells M2: at release of lock 2, had seen M0's writes through #20, &c 0: 20 1: 25 2: 19 3: 36 ... this is a "vector timestamp" M2 remembers a vector timestamp of writes it has seen M2 compares w/ M1's VT to see what writes it needs from other machines More on VTs next lecture... Q: could TreadMarks work without using VM page protection? it uses VM to detect writes to avoid making hidden copies (for diffs) if not needed detect reads to pages => know whether to fetch a diff neither is really crucial so TM doesn't depend on VM as much as IVY does IVY used VM faults to decide what data has to be moved, and when TM uses acquire()/release() and diffs for that purpose Performance? Figure 3 shows mostly good scaling is that the same as "good"? though apparently Water does lots of locking / sharing How close are they to best possible performance? maybe Figure 5 implies there is only about 20% fat to be cut Does LRC beat previous DSM schemes? they only compare against their own straw-man ERC not against best known prior work Figure 9 suggests not much win, even for Water Has DSM been successful? clusters of cooperating machines are hugely successful DSM not so much main justification is transparency for existing threaded code that's not interesting for new apps and transparency makes it hard to get high performance MapReduce or message-passing or shared storage more common than DSM Picture to explain operation and sequential consistency of IVY: |-----------| |-----------| |-----------| | | | | | | | M0 | | M1 | | MGR | |-----------| |-----------| |-----------| | | | ---------------------------------------- M0 MGR M1 LDx