Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Adding GenAI Use Cases #27062

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Run LLM Inference on OpenVINO with the GenAI Flavor
:hidden:

NPU inference of LLMs <genai-guide-npu>
genai-guide/genai-use-cases


This guide will show you how to integrate the OpenVINO GenAI flavor into your application, covering
Expand Down Expand Up @@ -174,59 +175,6 @@ You can also create your custom streamer for more sophisticated processing:
pipe.generate("The Sun is yellow because", ov::genai::streamer(custom_streamer), ov::genai::max_new_tokens(100));
}

Using GenAI in Chat Scenario
################################

For chat scenarios where inputs and outputs represent a conversation, maintaining KVCache across inputs
may prove beneficial. The chat-specific methods **start_chat** and **finish_chat** are used to
mark a conversation session, as you can see in these simple examples:

.. tab-set::

.. tab-item:: Python
:sync: py

.. code-block:: python

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path)

pipe.set_generation_config({'max_new_tokens': 100)

pipe.start_chat()
while True:
print('question:')
prompt = input()
if prompt == 'Stop!':
break
print(pipe.generate(prompt))
pipe.finish_chat()


.. tab-item:: C++
:sync: cpp

.. code-block:: cpp

int main(int argc, char* argv[]) {
std::string prompt;

std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");

ov::genai::GenerationConfig config = pipe.get_generation_config();
config.max_new_tokens = 100;
pipe.set_generation_config(config)

pipe.start_chat();
for (size_t i = 0; i < questions.size(); i++) {
std::cout << "question:\n";
std::getline(std::cin, prompt);

std::cout << pipe.generate(prompt) << std::endl;
}
pipe.finish_chat();
}

Optimizing Generation with Grouped Beam Search
#######################################################
Expand Down
Loading
Loading