This repo consists of a synthetically generated dataset, along with the config for producing that dataset, and a working dbt project for the data. This data is intended to power a new project "Thyme to Shine Market" as part of the Lightdash demo.
- Synth for time-series synthetic data generation.
- Python for some data transformation using
pandas
andnumpy
. - google-cloud-sdk for loading to bigquery.
- dbt for data transformations as config.
sh seed-bigquery.sh
This will:
- use
synth
to generate all the base datasets using the synth configuration defined insynth/
, with help from the lookups define inlookup/
. Note that the final two commands are non-deterministic. - run some bespoke transformations (using Python
pandas
andnumpy
) on the data to improve its suitability for a demo. Note that this script is non-deterministic since it includes some randomness (e.g. to adjust product popularity). - load the datasets into bigquery under the dataset name
lightdash-analytics.lightdash_demo_gardening
.
This will run the dbt models and build tables in lightdash-analytics.lightdash_demo_gardening
cd dbt-bigquery
dbt run -m dbt_baskets dbt_orders dbt_support_requests dbt_users