FileRAG: A File-based Multimodal Retrieval-Augmented Generation System

Demo

FileRAG.Multimodal.Demo.mp4

Motivation

Traditional Retrieval-Augmented Generation (RAG) systems often struggle with maintaining context and coherence when dealing with large documents or complex information structures. FileRAG was born out of the need for a more efficient and context-aware knowledge/document/information retrieval system.

The primary motivations for developing FileRAG are:

Preserving Document Context: Unlike traditional RAG systems that often retrieve fragmented text snippets, FileRAG maintains the integrity of entire documents, ensuring that the context and coherence of information are preserved.
Multimodal Indexing and Retrieval: By using frontier models with vision and audio capabilities, FileRAG is able to index and retrieve text, images, audio files, and video files, providing a comprehensive file retrieval system.
Improved Precision: By summarizing and indexing entire documents and media files, FileRAG achieves higher precision in retrieving relevant information, especially crucial in fields such as academia, legal research, technical documentation, and multimedia content management.
Scalability: The file-based approach allows for easier management and updating of the knowledge base, making it more scalable for growing document and media collections.

By addressing these challenges, FileRAG aims to provide a more robust and effective solution for knowledge/document/information retrieval and information extraction tasks across various media types.

Features

Dual Model Support: Choose between Anthropic's Claude and OpenAI's GPT-4 for document summarization and retrieval.
Multiple File Format Support: Handles PDF, DOCX, TXT, MD, various image files (JPEG, PNG, GIF, WEBP), audio files (MP3, WAV, OGG, FLAC, AAC, OPUS, M4A), and video files (MP4, AVI, MOV, MKV).
Intelligent Summarization: Generates concise summaries of files for efficient indexing, including specialized summarization for audio transcripts and video content.
Video Processing: Extracts key frames from videos and summarizes both visual and audio content for comprehensive video understanding.
Context-Aware Retrieval: Retrieves relevant files based on user queries using advanced language models, including video content.
Flexible API Integration: Easily switch between different AI providers (Anthropic and OpenAI) for summarization and audio transcription (OpenAI and Lemonfox.ai).
Organized Results: Stores retrieval results in a structured folder system, separating text, image, audio, and video results for easy access and review.

Structure

Components

File Indexer (indexer.py): Indexes and summarizes files in a specified folder, including text, images, audio, and video files.
File Retriever (retriever.py): Retrieves relevant files based on user queries, handling text, images, audio files, and video files.

Prerequisites

Python 3.6+
anthropic library
openai library
PyPDF2 library
python-docx library
Pillow library
opencv-python library

Installation

Clone the repository:

git clone https://github.com/richards199999/FileRAG.git
cd FileRAG

Install the required dependencies:

pip install anthropic openai PyPDF2 python-docx Pillow opencv-python

Usage

File Indexer

Run the indexer:
```
python indexer.py
```
Choose the AI model (Anthropic or OpenAI) for summarization when prompted.
Choose the API (OpenAI or Lemonfox.ai) for audio transcription when prompted.
Enter your API key(s) for the chosen provider(s).
Specify the folder path containing the documents, images, audio files, and video files you want to index.
The script will generate a folder_overview.json file in the specified folder.

File Retriever

Run the retriever:
```
python retriever.py
```
Choose the AI model (Anthropic or OpenAI) when prompted.
Enter your API key for the chosen provider.
Specify the path to the folder_overview.json file created by the indexer.
Enter your queries when prompted. The script will retrieve relevant documents, images, audio files, and video files, saving them in the filerag_results folder with the following structure:
```
filerag_results/
├── api_response_log.txt
├── YYYYMMDD_HHMMSS/
│   ├── image_results/
│   ├── text_results/
│   │   └── retrieved_text_results.txt
│   ├── audio_results/
│   └── video_results/
└── ...
```
Each query session creates a new timestamped folder (YYYYMMDD_HHMMSS) containing the results for that session.

Configuration

API keys can be set as environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, LEMONFOX_API_KEY) or entered when prompted.
Adjust the max_tokens and temperature parameters in the API calls to fine-tune the model outputs.
For video processing, you can modify the number of key frames extracted by changing the num_frames parameter in the extract_key_frames() function.

Limitations

The system currently processes only the first 5 pages of PDF documents to manage processing time and API usage.
Large files may be truncated to fit within API token limits.
Audio files might encounter recognition issues with model capacity limitation, low-quality recordings or noisy environments.
Video processing may be time-consuming for large video files or when processing many videos at once.
The quality of video summarization depends on the extracted key frames and may not capture all nuances of the video content.

Contributing

Contributions to FileRAG are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgement

Great thanks to Claude-3.5 Sonnet from Anthropic for bringing the idea to life together!🤗

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LICENSE		LICENSE
README.md		README.md
indexer.py		indexer.py
retriever.py		retriever.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FileRAG: A File-based Multimodal Retrieval-Augmented Generation System

Demo

Motivation

Features

Structure

Components

Prerequisites

Installation

Usage

File Indexer

File Retriever

Configuration

Limitations

Contributing

License

Acknowledgement

About

Releases

Packages

Languages

License

richards199999/FileRAG

Folders and files

Latest commit

History

Repository files navigation

FileRAG: A File-based Multimodal Retrieval-Augmented Generation System

Demo

Motivation

Features

Structure

Components

Prerequisites

Installation

Usage

File Indexer

File Retriever

Configuration

Limitations

Contributing

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages