Skip to content

Commit

Permalink
Update scaling_rag_for_production.md
Browse files Browse the repository at this point in the history
re-inserted `` around Get_num...
  • Loading branch information
robertdhayanturner authored Feb 1, 2024
1 parent 966f298 commit a6d99e7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/use_cases/scaling_rag_for_production.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ for content in response:
print(content, end='', flush=True)
```

To **make using our application even more convenient**, we can simply adapt Ray's official documentation to **implement our workflow within a single QueryAgent class**, which bundles together and takes care of all of the steps we implemented above - retrieving embeddings, embedding the search query, performing vector search, processing the results, and querying the LLM to generate a response. Using this single class approach, we no longer need to sequentially call all of these functions, and can also include utility functions. (Specifically, _Get_num_tokens_ encodes our text and gets the number of tokens, to calculate the length of the input. To maintain our standard 50:50 ratio to allocate space to each of input and generation, we use _(text, max_context_length)_ to trim input text if it's too long.)
To **make using our application even more convenient**, we can simply adapt Ray's official documentation to **implement our workflow within a single QueryAgent class**, which bundles together and takes care of all of the steps we implemented above - retrieving embeddings, embedding the search query, performing vector search, processing the results, and querying the LLM to generate a response. Using this single class approach, we no longer need to sequentially call all of these functions, and can also include utility functions. (Specifically, `Get_num_tokens` encodes our text and gets the number of tokens, to calculate the length of the input. To maintain our standard 50:50 ratio to allocate space to each of input and generation, we use `(text, max_context_length)` to trim input text if it's too long.)

```python
import tiktoken
Expand Down

0 comments on commit a6d99e7

Please sign in to comment.