Skip to content

The goal of this project is to develop a RAG system using Agent from LangGraph to improve the travelling experience of tourists.

Notifications You must be signed in to change notification settings

enesbesinci/travel-guide-adaptive-rag

Repository files navigation

Travel Guide Chatbot with Adaptive RAG Using Agent from LangGraph

Introduction:

Hello everyone, in this project we'll develop a Retreival Augmented Generation workflow using Agents from LangGraph. This project will help you to understand the basic of Adaptive RAG and Agents in LangGraph and also provide a real world example of Agents.

Goal of The Project:

The aim of this project is to improve the travelling experience of tourists who visiting Istanbul by providing general and up-to-date information about the city. With this application, tourists will be able to get answers to general questions such as ‘Where is Hagia Sophia Mosque?’ or ‘What should I eat in Eminönü?’ as well as up-to-date questions such as ‘What are the bus ticket fares in Istanbul?’ or ‘What is the weather like in Istanbul?

Tech Stack:

  • OpenAI for LLM and Embeddings (gpt-4o-mini for LLM and text-embedding-3-small for Embeddings)
  • LangGraph and LangChain for creating Agent and RAG workflow
  • Chroma for VectorStore
  • Tavily for online-searching
  • Gradio for user interface

Note:

This guide assumes that you already have information about the LLM's, RAG and AGENTS

Dataset:

For this Proof of Concept (PoC), we selected three publicly available Istanbul travel guides. While numerous guides exist, these were chosen for their comprehensive and detailed information about the city. The selected guides are available in the project repository.

Coding:

Ensure that your environment variables are correctly configured if you are using an API to access LLM and Embedding models from providers like OpenAI, Google, Anthropic, or Mistral. Additionally, you'll need an API key to integrate Tavily, which offers a free trial version.

Clich here to get a Tavily API_KEY

Yes, we can build it now.

In this project, two primary data sources will be used to answer tourists' queries:

  • A VectorStore containing information from the selected Istanbul guides.
  • The Tavily Web Search Tool, for real-time online searching.

Creating an Index for the Documents:

A vectorstore is created to store the guides. Vectorstores enable semantic searches by retrieving documents based on the meaning of the query, rather than exact word matches, improving the relevance and accuracy of results. This project uses Chroma for document storage and OpenAI's "text-embedding-3-small" model for embeddings. However, other LLMs and vectorstore solutions can also be used.

For the PoC, three PDF guides about Istanbul are split and stored in a vectorstore to create a retriever for querying.

image

Let's try the vectorstore by providing a query.

image

Create a Tavily Web Search tool for online searching:

We create a web search tool, which is our second source to access current , to use it in Agent.

image

Let's check if it's working.

image

The results look good. So far, we have created two data sources required for the application. Now we can create our nodes and edges that is the most important components of the agent.

Nodes and Edges: Key Components of the Agent

In this section, we will define various functions. Some of these functions will act as nodes, and others will serve as edges.

  • Nodes: These represent individual processing units or operations within the agent. Each node performs a specific task, such as handling a query, performing a search, or generating a response.

  • Edges: These define the connections between nodes, determining the flow of information within the agent. An edge represents the transition from one node to the next, allowing for the sequential execution of tasks.

TranslateQuery:

The first node we'll create will translate the user's question into English. We have two reasons for doing this.

  • Documents (guides about Istanbul) that we use in the project is in English Language.
  • The language model we will use gives more successful results in English language compared to other languages.
  • English is the most populer language in the world so lots of tourist can use this app easliy.

Let's create the first function.

We use an LLM to translate the query/question into English. To do this, we create a prompt that tells the LLM what to do.

Promt > Translate the question into English if the question is in another language. \n If the question is in English, do nothing.

After creating Promp, we put Prompt, LLM and StrOutputParser() functions into a chain.

image

Let's check if the node called TranslateQuery is working.

image

As you can see above, we have passed two questions in different languages to the node. The node called TranslateQuery translated the Turkish question into English. It works well.

Router:

Once the user's query has been translated into English, we define a function called RouteQuery. This function analyses the user query and routes it to the most relevant data source according to the content of the query.

There are two available data sources in this project:

  • Web-Search
  • Retriever

Let us analyse this function in more depth. This function is used as a ‘conditional edge’. The function will give a structured output and will return the values contained in the Literal keyword, i.e. either ‘vectorstore’ or ‘web_search’.

Prompt > You are an expert at routing a user question to a vectorstore or web search. Vectorstore contains documents about the history of Istanbul, touristic and historical places of Istanbul, and food and travel tips for tourists. Use vectorstore for questions on these topics. If the question is about transportation, weather, and other things, use web search.

image

Let's try the function that we created for routing.

image

The router works well, the first question was about the current events and the second one was about general information about Istanbul so Router routed the questions correctly.

Note that the output of the Router function. Router can just give 2 different output.

Retrieval Grader:

After retrieving the data related to the user query/question from the retriever, we add an additional layer to determine how relevant the retrieved data is to the question.

Prompt > You are a grader assessing relevance of a retrieved document to a user question. \n If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.

image

Let's try the Retrieval Grader function by passing two different questions.

image

The same documents were passed through the Retrieval Grader function using two distinct questions. For the query about Hagia Sophia, the function returned ‘relevant - yes’, whereas for the query about LeBron James, it returned ‘irrelevant - no’.

Generate:

After receiving the most relevant data to the user question/query. We are ready to generate an answer. For this unique use-case I have created a different prompt for generating.

Clich here to see the Prompt I desired for this project

Let's write the full code and ask a question.

image

It looks good, we can skip to the next step.

Hallucination Grader:

This section introduces a system that evaluates whether a language model’s generated response is based on factual information. The model uses a binary scoring system—either 'yes' or 'no'—to determine if the answer is grounded in the provided facts. The process involves setting up a grader using a specific LLM (language model) and a prompt that compares the generated output against a set of retrieved documents.

Prompt > You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts.

Let us then use the function to assess whether the generated answer is consistent with the real data and to help identify possible hallucinations (i.e., fabricated or unsupported content).

image

As illustrated in the image above, the documents retrieved by the retriever align with the model's output, indicating that the response is grounded in factual data. This confirms that the model is not generating hallucinated or unsupported information.

Answer Grader:

This section defines a system that evaluates whether a generated response effectively addresses the user's question. Using a binary score—'yes' or 'no'—the grader determines if the answer resolves the query. The process involves setting up a language model (LLM) to generate structured output, then comparing the user's question against the model’s generated answer using a specified prompt.

Prompt > You are a grader assessing whether an answer addresses / resolves a question \n Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question.

Let's pass the answer and the user's question/query generated with the Generate function in the previous step to this function and see the results.

image

The result is ‘yes’. This means that the generated answer answers the user's query/question.

Question Re-writer:

This function optimises user questions/queries to improve their suitability for vector-based retrieval. Using a language model, the function interprets the semantic intent of the input question and formulates a more optimised version. The aim is to improve the efficiency of the retrieval process by providing clearer and more precise questions.

Prompt > You a question re-writer that converts an input question to a better version that is optimized \n for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning.

Notice that the variable named "question" contains the question "Where is the Hagia Sophia? Let's pass the same question to the function and see what happens.

image

As you can see in the image above, function optimised the question for vector-based search and rewrote it.

New Rewritten Question > "What is the location of the Hagia Sophia?"

Output Guardrails:

As you know, this application will be used by people from many different cultures, countries, races and genders. Therefore, we want to filter answers that contain negative/inappropriate content about these people.

This function checks whether the answer contains negative content for users.

image

Let's check the function.

image

We have created all the funcstions to use it in the Agent. We'll create nodes and edges using these functions. Now we can skip to the next step.

GraphState Defination:

The first thing we should do when define a graph is define the State of the graph. Now we'll define a State and specify the parameters that it will take. For this RAG flow we have 3 parameters.

  • question > User question/query
  • generation > LLM generation
  • documents > List of documents

image

After creating the State we can create the most important components of the an Agent. Nodes and Edges.

Creating Nodes and Edges:

Let's start by creating nodes.

In short, Nodes receive the current State as input and return a value that updates that state.

Retrieve Node:

image

Retrieve Node takes the question of the State as input and returns the relevant documents to the question as output. As you can in the image above, we have used the "retriever" function we have created before.

Translator Node:

image

The Translator node takes the State question as input and translates the query using the question_translater’ function, then outputs the translated question as output.

Generate Node:

image

Generate Node takes the question and a list of documents of the State as input and returns a response as output. It does this with the function we defined called ‘rag_chain’.

Grade Documents:

image

Grade Documents Node takes question and documents of the State and evaluates using the retrieval_grader function and returns only the relevant documents.

Transform Query Node:

image

Transform Query Node transforms the query to produce a better the question by using question_rewriter functions that we defined above.

Web Search Node:

image

Output Guardrails Node:

Finally, we will add a Node that takes the generated output as input. If the generated output contains any content that is racist, sexist or against human rights, we do not want to show the answer to the user.

That's it. We have defined the all of the Nodes of the Agent. Now we can continue by defining the Edges.

image

Route Question Edge:

image

Route Question Edge takes the user question as input and redirects it to ‘web_search’ or ‘vectorstore’ based on the content of the question. The question_router function is used for this.

Decide To Generate Edge:

image

Decide To Generate Edge takes the question and filtered documents, if there are any relevant documents it returns ‘generate’ output to generate the answer, if there are no relevant documents it returns ‘transform_query’ output.

Grade Generation and Documents and Question Edge:

image

Determines whether the generation is grounded in the document and answers question.

We have defined all Nodes and Edges. Now, we can continue by building the Graph.

Building Graph:

Firstly, we define the nodes in the Graph we created.

image

We have defined all the nodes. Now let's create a logic for our Agent and add Edges.

image

As you can see in the code, we determined the flow of the graph. We did this with the help of various edges. The nodes are simple Python functions that process the input from the edges. This is a good reason to work with LangGraph instead of LangChain agents.

Finally, let's compile the Graph we created. This is a required step.

image

Visualization of the Agent Flow:

We can visualise Agent Flow for better understanding and error checking.

image

Let's see the graph.

image

Run the Agent:

Now that everything's ready, we can ask the Agent our questions.

Firstly, I will ask a question to use the web-search node.

image

The response looks good. Here is the full response:

As of now, the weather in Istanbul is quite pleasant. The current temperature is approximately 28.2°C (82.8°F), and it is sunny, making it a great day to explore the city. The humidity level is around 70%, which can make it feel a bit warmer, with a "feels like" temperature of 28.3°C (83.0°F). \n\nThere is a moderate breeze coming from the west-southwest at about 11.9 mph (19.1 kph), which can provide some relief from the heat. The atmospheric pressure is 1008 mb, and there are no precipitation forecasts, so you can expect a dry day. Visibility is good at 10 km (6 miles), and the UV index is relatively high at 8.0, so it's advisable to wear sunscreen if you're spending time outdoors.\n\nOverall, it's a beautiful sunny day in Istanbul, perfect for sightseeing and enjoying the city's rich history and vibrant culture!

Note also that we can see the stages through which the answer is generated.

image

Now let's ask another question where the agent will use the vector-store, but I will ask this question in Turkish.

Question: İstanbul'da gezmek için 3 tane tarihi mekan önerebilir misin? Bu mekanlardan çok kısaca bahset?

image

Response: Istanbul is rich in history and offers numerous historical sites to explore. Here are three must-visit historical places:\n\n1. Hagia Sophia (Ayasofya): Originally constructed as a Christian basilica in 537 AD, Hagia Sophia is renowned for its massive dome and stunning mosaics. It served as a cathedral for nearly a thousand years before being converted into a mosque after the Ottoman conquest of Constantinople in 1453. Today, it stands as a museum, showcasing its architectural grandeur and historical significance.\n\n2. Topkapı Palace: This opulent palace was the primary residence of the Ottoman sultans for approximately 400 years. Located in the Historic Peninsula, Topkapı Palace is a sprawling complex that includes beautiful courtyards, exquisite tile work, and a rich collection of artifacts, including the famous Topkapi Dagger and the Prophet Muhammad's cloak. Visitors can explore the various sections of the palace, including the Harem, which offers a glimpse into the private lives of the sultans and their families.\n\n3. Blue Mosque (Sultan Ahmed Mosque): Known for its striking blue tiles that adorn its interior, the Blue Mosque is an iconic symbol of Istanbul. Built between 1609 and 1616, it features six minarets and a grand courtyard. The mosque is still an active place of worship, so visitors are encouraged to dress modestly and respect prayer times. Its stunning architecture and serene atmosphere make it a must-see for anyone visiting the city.\n\nThese sites not only reflect Istanbul's rich history but also its cultural and architectural diversity, making them essential stops on your journey through the city.

Now let's ask Agent a question in Korean.

image

Question: 쉴레이마니예 모스크의 역사에 대한 정보를 제공하세요. (Provide some information about the history of the Sheleimaniye Mosque)

Response: The Suleymaniye Mosque is one of the most significant architectural masterpieces in Istanbul, dominating the city's skyline from its prominent position on one of the highest hills. It was commissioned by Sultan Suleyman the Magnificent, who ruled during the height of the Ottoman Empire, and was completed in 1557. The mosque was designed by the renowned Ottoman chief architect Mimar Sinan, who is celebrated for his contributions to the architectural landscape of Istanbul, having designed numerous buildings throughout the city.\n\nThe Suleymaniye Mosque was conceived as an Islamic response to the grandeur of the Byzantine Hagia Sophia, showcasing the Ottoman Empire's architectural prowess. The mosque features a vast central dome, which is surrounded by tiered domes, creating a stunning visual effect both from the outside and within. The interior of the mosque is particularly impressive, characterized by its spaciousness and the intricate details of its design.\n\nMimar Sinan's work on the Suleymaniye Mosque is often regarded as a culmination of his architectural achievements, as he perfected the use of the central dome in mosque design. The mosque not only serves as a place of worship but also as a complex that includes a madrasa (educational institution), a hospital, a public kitchen, and a caravanserai, reflecting the multifaceted role of mosques in Ottoman society.\n\nOverall, the Suleymaniye Mosque stands as a testament to the artistic and cultural zenith of the Ottoman Empire, embodying both religious significance and architectural innovation.

Conclusion:

In this project, we developed a Travel Guide Chatbot using Retrieval-Augmented Generation (RAG) with Agents from LangGraph. The chatbot integrates multiple technologies such as OpenAI's LLM for embeddings, Chroma for document storage, and Tavily for online search capabilities. By combining vector-based retrieval with real-time search, the chatbot effectively addresses various tourist queries, from historical landmarks to current weather conditions.

The implementation demonstrates how nodes and edges can structure an agent's decision-making process, ensuring accurate and relevant responses. This approach enhances tourists' experience by providing comprehensive, up-to-date information about Istanbul in a user-friendly manner. The project showcases the power of adaptive RAG workflows in creating efficient and interactive tools for real-world applications.

About

The goal of this project is to develop a RAG system using Agent from LangGraph to improve the travelling experience of tourists.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published