Building an End-to-End Retrieval-Augmented Generation System

Welcome to the Building an End-to-End Retrieval-Augmented Generation System repository. This repository is designed to guide you through the process of creating a complete Retrieval-Augmented Generation (RAG) system from scratch, following a structured curriculum.

Setup Instructions

To get started with the course:

Clone this repository:

git clone https://github.com/CarlosCaris/practicos-rag.git

Create a virtual environment
```
python -m venv .venv
```

Activate the environment

 # On Mac
 .venv/bin/activate
 # On Windows
 .venv\Scripts\activate

Install requirements
```
pip install -r requirements.txt
```

Introduction

This repository contains the materials and code needed to build a complete Retrieval-Augmented Generation (RAG) system. A RAG system combines the strengths of large language models with an external knowledge base to improve the accuracy and relevance of generated responses. Throughout this course, you'll gain hands-on experience with the various components of a RAG system, from document chunking to deployment in the cloud.

Course Outline

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

Objective: Understand the fundamentals of RAG and its applications.
Topics:
- Overview of RAG systems
- Challenges in large language models (e.g., hallucinations, outdated information)
- Basic components of a RAG system
Practical Task: Set up your development environment and familiarize yourself with the basic concepts.
Resources:
- Basics
- More concepts

Lesson 2: Document Chunking Strategies

Objective: Learn how to effectively segment documents for better retrieval performance.
Topics:
- Chunking techniques: token-level, sentence-level, semantic-level
- Balancing context preservation with retrieval precision
- Small2Big and sliding window techniques
Practical Task: Implement chunking strategies on a sample dataset.
Resources:
- The five levels of chunking
- A guide to chunking

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
src		src
.gitignore		.gitignore
01_intro.ipynb		01_intro.ipynb
02_chunking.ipynb		02_chunking.ipynb
03_embedding.ipynb		03_embedding.ipynb
04_vector_databases.ipynb		04_vector_databases.ipynb
05_retrieval_methods.ipynb		05_retrieval_methods.ipynb
06_benchmarks.ipynb		06_benchmarks.ipynb
README.md		README.md
main_app.py		main_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building an End-to-End Retrieval-Augmented Generation System

Setup Instructions

Table of Contents

Introduction

Course Outline

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

Lesson 2: Document Chunking Strategies

About

Releases

Packages

Contributors 2

Languages

CarlosCaris/practicos-rag

Folders and files

Latest commit

History

Repository files navigation

Building an End-to-End Retrieval-Augmented Generation System

Setup Instructions

Table of Contents

Introduction

Course Outline

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

Lesson 2: Document Chunking Strategies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages