- The introduction says that it is more difficult to ensure deterministic execution on physical servers than on VMs. Why is this the case?
- What is a hypervisor?
- Both GFS and VMware FT provide fault tolerance. How should we think about when one or the other is better?
- How do Section 3.4's bounce buffers help avoid races?
- What is "an atomic test-and-set operation on the shared storage"?
- How much performance is lost by following the Output Rule?
- What if the application calls a random number generator? Won't that yield different results on primary and backup and cause the executions to diverge?
- How were the creators certain that they captured all possible forms of non-determinism?
- What happens if the primary fails just after it sends output to the external world?
- Section 3.4 talks about disk I/Os that are outstanding on the primary when a failure happens; it says "Instead, we re-issue the pending I/Os during the go-live process of the backup VM." Where are the pending I/Os located/stored, and how far back does the re-issuing need to go?
- How secure is this system?
- Is it reasonable to address only the fail-stop failures? What are other type of failures?
How does VM FT handle network partitions? That is, is it possible that if the primary and the backup end up in different network partitions that the backup will become a primary too and the system will run with two primaries?