This repository provides a framework for setting up a relational MySQL database using Python, as well as building a Text-to-SQL Streamlit app that converts plain language questions into SQL queries. The application interacts with a SQLite database populated with AdventureWorks sample data.
Two approaches are used: Prompt Engineering and Retrieval-Augmented Generation (RAG). Both utilize the same LLM, Gemini 1.5 Flash, via Google AI Studio to execute SQL queries on the database and retrieve results. For more details, see the publication.
[Publication
] [Dataset
] [Schema
] [Streamlit App
]
- Clone this repository:
git clone https://github.com/ahmedsalim3/AdventureWorks-Database.git
- Navigate to the repository directory:
cd AdventureWorks-Database
- Create a Virtual Environment (Recommended):
python3 -m venv env
source env/bin/activate
- Install Requirements:
pip install -r requirements.txt
-
Configure the database at config
-
run
MySQLDatabaseManager
class to create the database, follow TODO file for more details
python3 -m src.rdbms.csv2mysql
- To explore the database tables run:
python3 -m src.rdbms.table_info
-
Get your API key and set it in the config. You can obtain a free one from here, and make sure to uncomment the local run lines
-
Paste it in the .env_example file and rename that file to
.env
.
You can run either the Prompt Engineering or the RAG approach from the main file. Just ask your question and choose the method.
python3 -m src.main
To run the Streamlit app, use the following command:
python3 -m streamlit run app.py
To run the app using Docker, follow these steps:
- Build the Docker image:
docker build -t image_name .
- Run the Docker container:
docker run -p 8501:8501 image_name
.
├── app <- Streamlit related fils
│ ├── _pages
│ │ ├── prompt_page.py <- Prompting app page
│ │ ├── rag_page.py <- RAG app page
│ │ └── utils.py
│ └── ui <- User interface components
│
├── data <- Data directory containing databases and raw data
│ ├── database
│ └── raw
│
└── src
│ ├── rag
│ │ ├── documents.py <- Document handling for RAG
│ │ ├── example <- Vector database example
│ │ └── vector_db.py <- sqlite-vec database
│ │
│ ├── rdbms <- Relational Database Management System tools
│ │ ├── TODO.md
│ │ ├── csv2mysql.py <- Module for converting CSV to MySQL
│ │ ├── mysql2sqlite <- Submodule for converting MySQL to SQLite
│ │ └── table_info.py
│ │
│ ├── config.py <- Configuration settings for the project
│ ├── constants.py <- Constant values related to AdventureWorks database
│ ├── main.py <- Main file to run which approach
│ ├── prompting_text_to_sql.py <- Prompting implementation for text-to-SQL
│ ├── rag_text_to_sql.py <- RAG implementation for text-to-SQL
│ └── utils.py
│
├── app.py <- Main Streamlit app
├── requirements.txt