GitHub - sgcowell/tpcds-iceberg-utils

Prereqs:

TPC-DS dsdgen - you can build from source from https://github.com/gregrahn/tpcds-kit.
Spark 3.2

To create Iceberg tables on top of a TPC-DS dataset:

Generate base TPC-DS data files using dsdgen - see run-dsdgen.sh for example command line. You can download source for dsdgen that can compile on MacOS at https://github.com/gregrahn/tpcds-kit.
Run gen-create-tables-script.sh to generate a SQL script that can be run via Spark SQL to create Iceberg tables from the generated text data files.

For example, if I want to create my Iceberg tables under ~/warehouse/tpcds, and I will register that location as a catalog in Spark as 'tpcds':

./gen-create-tables-script.sh tpcds /path/to/tpcds/files > tpcds.sql

You can then run the script in Spark using something like:

$HOME/spark/spark-3.2.0-bin-hadoop3.2/bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
--conf spark.sql.catalog.spark_catalog.type=hive
--conf spark.sql.catalog.tpcds=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.tpcds.type=hadoop
--conf spark.sql.catalog.tpcds.warehouse=$HOME/warehouse/tpcds
--conf spark.sql.iceberg.handle-timestamp-without-timezone=true
--conf spark.executor.memory=4g
--conf spark.driver.memory=4g
-f tpcds.sql

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
tables		tables
README.md		README.md
create-table.awk		create-table.awk
gen-create-tables-script.sh		gen-create-tables-script.sh
run-dsdgen.sh		run-dsdgen.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

sgcowell/tpcds-iceberg-utils

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages