This uses the new Source V2 Interface to connect to Apache Arrow Flight endpoints. It is a prototype of what is possible with Arrow Flight. The prototype has achieved 50x speed up compared to serial jdbc driver and scales with the number of Flight endpoints/spark executors being run in parallel.
It currently supports:
- Columnar Batch reading
- Reading in parallel many flight endpoints as Spark partitions
- filter and project pushdown
It currently lacks:
- support for all Spark/Arrow data types and filters
- write interface to use
DoPut
to write Spark dataframes back to an Arrow Flight endpoint - leverage the transactional capabilities of the Spark Source V2 interface
- publish benchmark test