Umbra implementation of the LDBC Social Network Benchmark's BI workload.
The Umbra container is currently available upon request.
The Umbra implementation expects the data to be in composite-merged-fk
CSV layout, with headers and without quoted fields.
To generate data that confirms this requirement, run Datagen without any layout or formatting arguments (--explode-*
or --format-options
).
In Datagen's directory (ldbc_snb_datagen_spark
), issue the following commands. We assume that the Datagen project is built and the ${PLATFORM_VERSION}
, ${DATAGEN_VERSION}
environment variables are set correctly.
export SF=desired_scale_factor
export LDBC_SNB_DATAGEN_MAX_MEM=available_memory
export LDBC_SNB_DATAGEN_JAR=$(sbt -batch -error 'print assembly / assemblyOutputPath')
rm -rf out-sf${SF}/
tools/run.py \
--cores $(nproc) \
--memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
-- \
--format csv \
--scale-factor ${SF} \
--mode bi \
--output-dir out-sf${SF}
Note that unlike Postgres, Umbra does not support directly loading from compressed files (.csv.gz
).
-
Set the
${UMBRA_CSV_DIR}
environment variable to point to the data set.-
To use a locally generated data set, set the
${LDBC_SNB_DATAGEN_DIR}
and${SF}
environment variables and run:export UMBRA_CSV_DIR=${LDBC_SNB_DATAGEN_DIR}/out-sf${SF}/graphs/csv/bi/composite-merged-fk/
Or, simply run:
. scripts/use-datagen-data-set.sh
-
To download and use the sample data set, run:
wget -q https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-merged-fk.zip unzip -q social-network-sf0.003-bi-composite-merged-fk.zip export UMBRA_CSV_DIR=`pwd`/social-network-sf0.003-bi-composite-merged-fk/graphs/csv/bi/composite-merged-fk/
Or, simply run:
scripts/get-sample-data-set.sh . scripts/use-sample-data-set.sh
-
-
The data set should consist of uncompressed CSVs. If you retrieved a compressed data set (
.csv.gz
files), set the${UMBRA_CSV_DIR}
environment variable and uncompress the files (note that doing so deletes the original compressed files):scripts/decompress-data-set.sh
-
To start the DBMS, create a database and load the data, run:
scripts/load-in-one-step.sh
-
The substitution parameters should be generated using the
paramgen
.
To run the queries, issue:
scripts/queries.sh
For a test run, use:
scripts/queries.sh ${SF} --test
To run the queries and the batches alternately, as specified by the benchmark, run:
scripts/benchmark.sh
To connect to the database through the SQL console, use:
scripts/connect.sh