Skip to content

declark1/flight-spark-source

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark source for Flight enabled endpoints

Build Status

This uses the new Source V2 Interface to connect to Apache Arrow Flight endpoints. It is a prototype of what is possible with Arrow Flight. The prototype has achieved 50x speed up compared to serial jdbc driver and scales with the number of Flight endpoints/spark executors being run in parallel.

It currently supports:

  • Columnar Batch reading
  • Reading in parallel many flight endpoints as Spark partitions
  • filter and project pushdown

It currently lacks:

  • support for all Spark/Arrow data types and filters
  • write interface to use DoPut to write Spark dataframes back to an Arrow Flight endpoint
  • leverage the transactional capabilities of the Spark Source V2 interface
  • publish benchmark test

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 89.5%
  • Scala 10.5%