WikiQA is a Retrieval-Augmented Generation (RAG) pipeline built for answering questions by retrieving the most relevant wikipedia articles using the Wikipedia API and the Google Search API.
- Gemini: Used for generating an eval dataset and as a judge.
- LlamaIndex: Core framework for building the RAG pipeline.
- HuggingFace: Open-Source
Phi-3.5
SLM andbge-base-en
embeddings. - ChromaDB: Open-source persistent vector DB.
- Ragas: Evaluating the RAG pipeline.
-
Clone the repository:
git clone https://github.com/ahmedo42/wikiqa.git cd wikiqa
-
Install dependencies:
pip install -r requirements.txt
-
Run the pipeline demo:
jupyter notebook demo.ipynb
rag.py
: Core RAG pipeline implementation.helpers.py
: Utility functions for data preprocessing and query handling.build_eval_dataset.ipynb
: Notebook to construct a dataset for testing using Google's Gemini.demo.ipynb
: Interactive demo of the pipeline.eval.ipynb
: Evaluate the RAG pipeline using the generated eval dataset.