Presentation materials and code for the "Missing DataFrame" talk
There is a missing tool in your Java data structure toolkit – DataFrames!
Java developers are well versed in the Java Collections Framework (JCF), but the JCF requires predefined types and operates at a low level of abstraction.
DataFrames offer Java developers a competitive alternative to Python/Pandas and Scala/Spark. DataFrames give you the ability to easily transform and organize data in code, while providing efficiency, flexibility, code readability, and developer productivity. DataFrames are used in real-world scenarios such as data transformation, data enrichment, data validation, and reconciliation.
Join this talk and discover how DataFrames can elevate your Java programming experience!
You will find Java data frame frameworks (dataframe-ec and Tablesaw) and Kotlin Data Frame code here. For each of the three frameworks the project contains:
- The One Billion Row Challenge implementation
- The One Billion Row Challenge implementation with logging that attempts to show time spent and memory used
- The Donut Shop use cases (see the presentation) implemented as unit tests
For the Python code implementing these examples see this repository.