Automatic checkpointing with logical signatures.
Entirely lazy pipeline definitions.
Same friendly api as spark.
- more complete rdd api coverage
- load from dataframe reader
- more complete dataframe api coverage
- sql transforms integrated with dependency graph
- use dataset backed DCs (wait until spark 2.0)
- DAG viewer frontend attached to running process
- component / pipeline abstractions
- debug run mode with auto Try wrapped functions and trapped failures