From a6d99e799054ff5249b56de4804a3a5b101e74b5 Mon Sep 17 00:00:00 2001 From: robertturner <143536791+robertdhayanturner@users.noreply.github.com> Date: Wed, 31 Jan 2024 19:11:05 -0500 Subject: [PATCH] Update scaling_rag_for_production.md re-inserted `` around Get_num... --- docs/use_cases/scaling_rag_for_production.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/use_cases/scaling_rag_for_production.md b/docs/use_cases/scaling_rag_for_production.md index e1958a6bd..5245a4566 100644 --- a/docs/use_cases/scaling_rag_for_production.md +++ b/docs/use_cases/scaling_rag_for_production.md @@ -465,7 +465,7 @@ for content in response: print(content, end='', flush=True) ``` -To **make using our application even more convenient**, we can simply adapt Ray's official documentation to **implement our workflow within a single QueryAgent class**, which bundles together and takes care of all of the steps we implemented above - retrieving embeddings, embedding the search query, performing vector search, processing the results, and querying the LLM to generate a response. Using this single class approach, we no longer need to sequentially call all of these functions, and can also include utility functions. (Specifically, _Get_num_tokens_ encodes our text and gets the number of tokens, to calculate the length of the input. To maintain our standard 50:50 ratio to allocate space to each of input and generation, we use _(text, max_context_length)_ to trim input text if it's too long.) +To **make using our application even more convenient**, we can simply adapt Ray's official documentation to **implement our workflow within a single QueryAgent class**, which bundles together and takes care of all of the steps we implemented above - retrieving embeddings, embedding the search query, performing vector search, processing the results, and querying the LLM to generate a response. Using this single class approach, we no longer need to sequentially call all of these functions, and can also include utility functions. (Specifically, `Get_num_tokens` encodes our text and gets the number of tokens, to calculate the length of the input. To maintain our standard 50:50 ratio to allocate space to each of input and generation, we use `(text, max_context_length)` to trim input text if it's too long.) ```python import tiktoken