Skip to content

CarlosCaris/practicos-rag

Repository files navigation

Building an End-to-End Retrieval-Augmented Generation System

Welcome to the Building an End-to-End Retrieval-Augmented Generation System repository. This repository is designed to guide you through the process of creating a complete Retrieval-Augmented Generation (RAG) system from scratch, following a structured curriculum.

Setup Instructions

To get started with the course:

  1. Clone this repository:
    git clone https://github.com/CarlosCaris/practicos-rag.git
  2. Create a virtual environment
    python -m venv .venv
  3. Activate the environment
     # On Mac
     .venv/bin/activate
     # On Windows
     .venv\Scripts\activate
  4. Install requirements
    pip install -r requirements.txt

Table of Contents

Introduction

This repository contains the materials and code needed to build a complete Retrieval-Augmented Generation (RAG) system. A RAG system combines the strengths of large language models with an external knowledge base to improve the accuracy and relevance of generated responses. Throughout this course, you'll gain hands-on experience with the various components of a RAG system, from document chunking to deployment in the cloud.

Course Outline

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

  • Objective: Understand the fundamentals of RAG and its applications.
  • Topics:
    • Overview of RAG systems
    • Challenges in large language models (e.g., hallucinations, outdated information)
    • Basic components of a RAG system
  • Practical Task: Set up your development environment and familiarize yourself with the basic concepts.
  • Resources:
    • Basics
    • More concepts

Lesson 2: Document Chunking Strategies

  • Objective: Learn how to effectively segment documents for better retrieval performance.
  • Topics:
    • Chunking techniques: token-level, sentence-level, semantic-level
    • Balancing context preservation with retrieval precision
    • Small2Big and sliding window techniques
  • Practical Task: Implement chunking strategies on a sample dataset.
  • Resources:
    • The five levels of chunking
    • A guide to chunking