Skip to content

This is a dbt, DuckDB and Python project which uses the [ATP Tennis Rankings, Results, and Stats](https://github.com/JeffSackmann/tennis_atp) dataset that is publicly available on Github

License

Notifications You must be signed in to change notification settings

achilala/dbt-atp-tour

Repository files navigation

What this project is about

This is a dbt, DuckDB and Python project which uses the ATP Tennis Rankings, Results, and Stats dataset that is publicly available on Github, to demonstrate how to structure a dimensional model suitable for analytics. Another helpful resource is this Youtube video with regard to how to get the data into a DuckDB database.

Pre-requisites

For this project make sure you have following installed:

Getting started with this project

Clone the ATP Tour project to somewhere on your local directory

git clone https://github.com/achilala/dbt-atp-tour.git
cd dbt-atp-tour

Create a virtual environment for your python dependencies and activate it. Your python dependencies will be installed here.

python3 -m venv .venv/dbt-atp-tour
source .venv/dbt-atp-tour/bin/activate

Install the python dependencies

pip install -r requirements.txt

Some of the installed modules require reactivating your virtual environment to take effect

deactivate
source .venv/dbt-atp-tour/bin/activate

Running this python script will read the ATP tour data into a duckdb database called atp_tour.duckdb

python3 download_atp_tour_data.py

A new file called atp_tour.duckdb should appear in the folder, and this is the DuckDB database file. The reason why DuckDB is my-go-to database is because it's very light, simple to setup and built for analytical processing.

Before running dbt makes sure the profiles.yml file is setup corrently. The path in the file should point to your duckdb database and on mine it looks something like this atp_tour.duckdb.

Test your connection and adjust your profiles.yml settings accordingly until you get a successful test.

dbt debug

Run the dbt project to build your models for analysis

dbt clean && dbt deps && dbt build

To generate and view the project documentation run the following.

dbt docs generate && dbt docs serve

A browser should open with the docs site or click here. To cancel press the following on the keyboard

^c

Use the DuckDB CLI to query the DuckDB database. If you don't already have DuckDB v0.8.1 or higher installed then proceed to do the following:

Download and unzip the CLI

curl -OL https://github.com/duckdb/duckdb/releases/download/v0.9.1/duckdb_cli-osx-universal.zip
unzip duckdb_cli-osx-universal.zip

Open the database using the downloaded DuckDB CLI like this

./duckdb --readonly atp_tour.duckdb

And if you already have DuckDB install then open the database like this

duckdb --readonly atp_tour.duckdb

To sample the players data try the following

summarize mart.dim_player;

select *
  from mart.dim_player
 order by dim_player_key;

Now that the data is modeled, we can now run the streamlit app

streamlit run atp_tour_app.py

For BI and analysing the data we'll use Metabase, run Docker to setup it up

docker-compose up --build

To tear down the Metabase setup and start again

docker-compose down

Configure Metabase as follows:

Add your data: DuckDB
Display name: ATP Tour
Database file: /home/db/atp_tour.duckdb

You're now ready to browse the data! Raw data is in the raw schema and the final dimensional model is in mart

For consuming the data from other apps, run the following command to start the API service

uvicorn atp_tour_api:app --reload

About

This is a dbt, DuckDB and Python project which uses the [ATP Tennis Rankings, Results, and Stats](https://github.com/JeffSackmann/tennis_atp) dataset that is publicly available on Github

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published