Natural Language Processing (NLP), Large Language Models (LLM), and the Power of Vector Embeddings and Databases
Converting text into embedding vectors is the first step to any text processing pipeline. As the amount of text gets larger, there is often a need to save these embedding vectors into a dedicated vector index or library, so that developers won't have to recompute the embeddings and the retrieval process is faster. We can then search for documents based on our intended query and pass these relevant documents into a language model (LM) as additional context. We also refer to this context as supplying the LM with "state" or "memory". The LM then generates a response based on the additional context it receives!
In this notebook, we will implement the full workflow of text vectorization, vector search, and question answering workflow. While we use FAISS (vector library) and ChromaDB (vector database), and a Hugging Face model, know that you can easily swap these tools out for your preferred tools or models!
- Implement the workflow of reading text, converting text to embeddings, saving them to FAISS and ChromaDB
- Query for similar documents using FAISS and ChromaDB
- Apply a Hugging Face language model for question answering.