This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Generation (RAG) pipeline and a Question Answering system using Indexify.
- Entity and relationship extraction using spaCy NER
- Knowledge graph construction and storage in Neo4j
- Text embedding generation using Sentence Transformers
- Natural language question to Cypher query conversion using Google's Gemini AI
- Question answering based on the knowledge graph
- Python 3.9+
- Google Cloud account with Gemini API access
- Docker and Docker Compose (for containerized setup)
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/knowledge_graph
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Install and start a Neo4j database locally.
-
Set up environment variables:
export NEO4J_URI=bolt://localhost:7687 export NEO4J_USER=neo4j export NEO4J_PASSWORD=your_password export GOOGLE_API_KEY=your_google_api_key
-
Run the main script:
python workflow.py --mode in-process-run
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/knowledge_graph
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
-
Install indexify:
pip install indexify
-
Ensure Docker and Docker Compose are installed on your system.
-
Create a
.env
file in the project directory and add your Google API key:GOOGLE_API_KEY=your_google_api_key_here
-
Build the images for the functions:
indexify-cli build-image workflow.py NLPFunction indexify-cli build-image workflow.py generate_embeddings indexify-cli build-image workflow.py build_knowledge_graph indexify-cli build-image workflow.py store_in_neo4j indexify-cli build-image workflow.py generate_answer docker-compose up --build
-
Run the main script:
python workflow.py --mode remote-deploy python workflow.py --mode remote-run
-
Knowledge Graph Creation:
- Entity Extraction: Uses spaCy to identify named entities in the input text.
- Relationship Extraction: Creates simple relationships between entities extracted from the text.
- Knowledge Graph Construction: Builds a graph structure from the extracted entities and relationships.
- Neo4j Storage: Stores the knowledge graph in a Neo4j database for later querying and analysis.
- Embedding Generation: Creates embeddings of the input text using Sentence Transformers.
-
Question Answering:
- Question to Cypher: Converts a natural language question to a Cypher query using Google's Gemini AI.
- Query Execution: Executes the Cypher query on the Neo4j database.
- Answer Generation: Uses Gemini AI to generate a natural language answer based on the query results.
The project uses two Indexify graphs:
-
Knowledge Graph RAG Pipeline:
extract_entities_and_text -> extract_relationships -> build_knowledge_graph -> store_in_neo4j -> generate_embeddings
-
Question Answering Pipeline:
question_to_cypher -> execute_cypher_query -> generate_answer
- Modify the
sample_doc
in themain()
function ofkg_rag_qa_pipeline.py
to process different texts. - Adjust the relationship extraction logic in
extract_relationships()
for more sophisticated relationship identification. - Change the embedding model in
generate_embeddings()
to use different pre-trained models. - Fine-tune the prompts in
question_to_cypher()
andgenerate_answer()
functions for better results.