Resilient Distributed Datasets: A Fault-Tolerant Abstraction forIn-Memory Cluster Computing
- Is Spark currently in use in any major applications?
- How common is it for PhD students to create something on the scale of Spark?
- Should we view Spark as being similar to MapReduce?
- Why are RDDs called immutable if they allow for transformations?
- Do distributed systems designers worry about energy efficiency?
- How do applications figure out the location of an RDD?
- How does Spark achieve fault tolerance?
- Why is Spark developed using Scala? What's special about the language?
- Does anybody still use MapReduce rather than Spark, since Spark seems to be strictly superior? If so, why do people still use MR?
- Is the RDD concept implemented in any systems other than Spark?
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing What applications can Spark support well that MapReduce/Hadoop cannot support?
[LAB 4 说明](6.824 Lab 4_ Sharded Key_Value Service.html)