mwc360 / AzureSynapseUtilities Public

Notifications You must be signed in to change notification settings
Fork 1
Star 1

A collection of utilities for Azure Synapse Analytics

1 star 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
SynapseDedicatedScripts		SynapseDedicatedScripts
TPC_DataGen		TPC_DataGen
LICENSE		LICENSE
README.md		README.md

Repository files navigation

SynapseUtilities

A collection of utilities for Azure Synapse Analytics

Synapse_TPC-DS_DataGen

Code used for generating TPC-DS or TPC-H Datasets as parquet and loading into Synapse Dedicated Pools.

Kuddos to @npoggi (https://github.com/databricks/spark-sql-perf/blob/master/src/main/notebooks/TPC-multi_datagen.scala#L14), I mostly just updated the scala script to work on Synapse Spark.

Steps to Run

Add package /Spark/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar to your Spark Pool
Import /Spark/SynapseSpark_TPC_DataGen.ipynb
Run SynapseSpark_TPC_DataGen Notebook attached to the Spark Pool in step 1
(Optional) Run

About

A collection of utilities for Azure Synapse Analytics

Report repository

Releases

No releases published

Packages

No packages published

Languages