Extract Transform Load (ETL) pipeline with analytics components.
The aim for this magicalytics is to provide insights from trend data (generated by Google Analytics) and able to provide interaction between the data and AI. For this initial test, I use Google data trend that is obtained from BigQuery.
The example data is obtained from Google BigQuery and the link can be found here.
The tools which will be used are:
- Poetry
- JupySQL
- Google BigQuery API
- Voila
- Ploomber
- will add more later....
The outcome of this project is an analytical dashboard which can be used to process Google Analytics result and communicate with Retrieval Augmented Generation (RAG) pipeline.
For this project, I use data from Google Analytics data which are tabulated in BigQuery. However, you may be able to use both single .csv file or any type of database.
The methods that I use is shown in the figure below.
Workflow figure cover the ETL diagram (1-5) and the next points will be added later.
- The workflow starts from environment preparation (Conda, Poetry, APIs, etc.).
- Downloading the data by querying necessary tables from Google BigQuery via API.
- Splitting the data into different tables based on ranking and deposited as .duckdb database.
- Building the Exploratory Data Analysis (EDA) pipeline.
- Pushing the clean data to the database for making processing ready data.
- Making unique analysis using NLP algorithm and other machine learning algorithms.
- Adding RAG feature to bridge the communication between users and analyzed data.
- Building dashboard for whole visualization and data summary.
- Deployment via Ploomber.
The interface would be a Voila dashboard that host default data from the sample data mentioned above. The dashboard can also host uploaded data. However, uploaded data will need to be selected first, since the dashboard requires .....
Figure dashboard here later...
This is solo project, so, I am the only one who is working on it! 😮💨
Feel free to visit my site as well https://www.sanka.studio