Update README.md

jairus-m · Dec 21, 2024 · 1a19dad · 1a19dad
1 parent d7eac54
commit 1a19dad
Showing 1 changed file with 19 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -7,11 +7,9 @@ Utilizing dltHub, dbt, + dagster as a framework for developing data products wit
 
 While the short-term goal is to learn these tools, the greater goal is to understand and flesh out what the full development and deployment cycle look like for orchestrating a data platform and deploying custom pipelines. There is a great process using dbt where we have local development, testing, versioning/branching, CICD, code-review, separation of dev and prod, project structure/cohesion etc., but how can we apply that to the entire data platform and espeacially, the 10-20% of ingestion jobs that cannot be done in a managed tool like Airbyte and/or is best done using a custom solution?
 
-# Current Status [12/17/24]
+# Current Status
 <img width="1512" alt="Screenshot 2024-12-13 at 11 00 14 PM" src="https://github.com/user-attachments/assets/a29f1da9-2d6c-46f7-b3ed-3ed6679c88e0" />
 
-- Officially Deployed this project to Dagster+ !!!
-  - CICD w/ branching deployments for every PR
 - Built a dltHub EL pipeline via the RESTAPIConfig class in `dagster_proj/assets/activities.py`
   - Declaratively extracts my raw activity data from Strava's REST API and loads it into DuckDB
 - Built a dbt-core project to transform the staged activities data in `analytics_dbt/models`
@@ -30,10 +28,20 @@ While the short-term goal is to learn these tools, the greater goal is to unders
   - WIP but the general flow of preprocessing, building the ML model, training, testing/evaluation, and prediction can be found in `dagster_proj/assets/energy_prediction.py`
   - This a downstream dependency of a dbt asset materialized in duckdb
 
+## Deployment Status
+- Officially Deployed this project to Dagster+ !!!
+  - CICD w/ branching deployments for every PR
+- Seperated execution environments into ([implementation details](https://github.com/jairus-m/dagster-dlt/pull/9))
+  - dev
+  - branch
+  - prod
+- Added `ruff` Python linter
+
 ## TODO:
-- Concretely seperate dev from prod
-- Add unittests
-- Incorporate a Python linter (like ruff) to make sure code is standardized, neat, and follow PEP8 
+- Add unit tests
+- Add additional CI checks to run unit tests, Python linting, etc
+- Beef up the ML pipeline with `dagster-mlflow` for experiment tracking, model versioning, better model observability, etc
+- Add new Strava end points / dbt models 
 
 # Getting Started:
 1. Clone this repo locally
@@ -54,4 +62,8 @@ While the short-term goal is to learn these tools, the greater goal is to unders
 5. Run the dagster daemon locally via `dagster dev`
 6. Materialize the pipeline!
 
-__Note:__ The `refresh_token` in the Strava UI produces an `access_token` that is limited in scope. Please follow these [Strava Dev Docs](https://developers.strava.com/docs/getting-started/#oauth) to generate the proper `refresh_token` which will then produce an `access_token` with the proper scopes. 
+__Additional Notes:__ 
+- The `refresh_token` in the Strava UI produces an `access_token` that is limited in scope. Please follow these [Strava Dev Docs](https://developers.strava.com/docs/getting-started/#oauth) to generate the proper `refresh_token` which will then produce an `access_token` with the proper scopes.
+- If you want to run the dbt project locally, outside of dagster, you need to add a `DBT_PROFILES_DIR` environment variable to the .env file and export it
+  - For example, my local env var is: `DBT_PROFILES_DIR=/Users/jairusmartinez/Desktop/dlt-strava/analytics_dbt`
+  - Yours will be: `DBT_PROFILES_DIR=/PATH_TO_YOUR_CLONED_REPO_DIR/analytics_dbt`