Skip to content

deib-polimi/Spark-Experiment-Runner

Repository files navigation

Spark Experiment Runner

All the code contained in this repo is licensed under the Apache License, version 2.

This repo supposes you have Hadoop, Spark, Hive, HDFS and YARN correctly installed and configured. Most of the shell scripts in this repo are Bourne shell compliant.

  1. Edit the config.sh file to set parameters for PySpark and the TPC-DS benchmark data generation;
  2. Generate the TCP-DS benchmark data using setup.sh in the gen_data folder;
  3. Run experiments with run_pyspark_queries.sh.

Configuration

The configuration file, config.sh, is thoroughly commented.

Notice that the Spark versions preceding 1.5.0 did not provide a REST endpoint to obtain all the logs related to an application. If your installation is recent enough, set REST_API=yes and write the HTTP address to the Spark History Server in the HISTORY_SERVER variable. Otherwise, disable REST_API and provide the HDFS path where the History Server stores its logs via the SPARK_LOGS variable.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published