This project demonstrates the creation of a Natural Language Query Agent capable of answering questions based on a small set of lecture notes from Stanford's LLM lectures and a table of milestone LLM architectures. The system leverages LLMs and open-source vector indexing and storage frameworks to provide conversational answers, with an emphasis on follow-up queries and conversational memory.
-
Stanford LLMs Lecture Notes:
- Introduction: Lecture Link
- Capabilities: Lecture Link
- Harm-1: Lecture Link
- Harm-2: Lecture Link
- Data: Lecture Link
- Modeling: Lecture Link
- Training: Lecture Link
-
Milestone Papers: Table of model architectures from Awesome LLM.
- data_loading.py: Contains functions to load data from the web and PDF.
- processing.py: Functions to split text into chunks and generate embeddings.
- model_initialization.py: Code to initialize the model and retrieval chain.
- main.py: Streamlit application for the chatbot interface.
Data Organization and Embedding:
-
Raw Data Loading:
- Web pages and PDF files are loaded using
WebBaseLoader
andPyPDFLoader
respectively.
- Web pages and PDF files are loaded using
-
Text Splitting:
- Documents are split into manageable chunks using
RecursiveCharacterTextSplitter
with a chunk size of 1200 characters and an overlap of 200 characters.
- Documents are split into manageable chunks using
-
Embedding:
- Text chunks are converted into embeddings using the HuggingFace model
BAAI/bge-small-en
.
- Text chunks are converted into embeddings using the HuggingFace model
-
Vector Store:
- The embeddings are stored in a Chroma vector store, making them searchable.
- WebBaseLoader: Fetches and loads web pages.
- PyPDFLoader: Loads and parses the PDF containing milestone papers.
- MergedDataLoader: Merges the data from the web and PDF loaders.
-
Text Splitting:
RecursiveCharacterTextSplitter
divides the loaded text into smaller, overlapping chunks to ensure that context is preserved.
-
Embedding Generation:
HuggingFaceBgeEmbeddings
generates embeddings for the text chunks using a pre-trained model.
-
Vector Store:
- The Chroma vector store is used to store and index these embeddings, enabling efficient retrieval.
-
LLM Initialization:
ChatGroq
initializes the chosen LLM model using the provided API key.
-
Prompt Templates:
- Custom prompt templates are created to reformulate user queries and generate responses based on the retrieved context.
-
Retrieval Chain:
- A retrieval chain is created that uses a history-aware retriever to provide context-aware answers.
A Streamlit application allows users to interact with the chatbot. Key features include:
- Input Query: Users can enter natural language queries.
- Chat History: The system maintains context across multiple queries.
- Display of Sources: The sources used to generate answers are displayed, ensuring transparency.
-
Deployment Plan:
- Can be directly deployed over Streamlit Cloud for public access
- Containerize the application using Docker for easy deployment.
- Use cloud services like AWS or GCP for scalability.
-
Scaling:
- Utilizing GPU capability to reduce the latency of generating the response.
- As the number of lectures or papers grows, the retrieval can be made more efficient through improved vector storing
- Implement caching strategies to improve response times for frequently asked questions.
- Enhanced Conversational Memory: Improving the system's ability to handle complex, multi-turn conversations.
- Citation and Reference Handling: More sophisticated citation mechanisms to link specific sections of texts used in answers.
-
Clone the Repository:
git clone <repository-url> cd <repository-folder>
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Application:
streamlit run main.py