openvinotoolkit · sgolebiewski-intel · Oct 24, 2024 · Oct 15, 2024 · Oct 15, 2024 · Oct 16, 2024
@@ -9,6 +9,7 @@ Run LLM Inference on OpenVINO with the GenAI Flavor
    :hidden:
 
    NPU inference of LLMs <genai-guide-npu>
+   genai-guide/genai-use-cases
 
 
 This guide will show you how to integrate the OpenVINO GenAI flavor into your application, covering
@@ -174,59 +175,6 @@ You can also create your custom streamer for more sophisticated processing:
             pipe.generate("The Sun is yellow because", ov::genai::streamer(custom_streamer), ov::genai::max_new_tokens(100));
          }
 
-Using GenAI in Chat Scenario
-################################
-
-For chat scenarios where inputs and outputs represent a conversation, maintaining KVCache across inputs
-may prove beneficial. The chat-specific methods **start_chat** and **finish_chat** are used to
-mark a conversation session, as you can see in these simple examples:
-
-.. tab-set::
-
-   .. tab-item:: Python
-      :sync: py
-
-      .. code-block:: python
-
-         import openvino_genai as ov_genai
-         pipe = ov_genai.LLMPipeline(model_path)
-
-         pipe.set_generation_config({'max_new_tokens': 100)
-
-         pipe.start_chat()
-         while True:
-            print('question:')
-            prompt = input()
-            if prompt == 'Stop!':
-               break
-            print(pipe.generate(prompt))
-         pipe.finish_chat()
-
-
-   .. tab-item:: C++
-      :sync: cpp
-
-      .. code-block:: cpp
-
-         int main(int argc, char* argv[]) {
-            std::string prompt;
-
-            std::string model_path = argv[1];
-            ov::genai::LLMPipeline pipe(model_path, "CPU");
-
-            ov::genai::GenerationConfig config = pipe.get_generation_config();
-            config.max_new_tokens = 100;
-            pipe.set_generation_config(config)
-
-            pipe.start_chat();
-            for (size_t i = 0; i < questions.size(); i++) {
-               std::cout << "question:\n";
-               std::getline(std::cin, prompt);
-
-               std::cout << pipe.generate(prompt) << std::endl;
-            }
-            pipe.finish_chat();
-         }
 
 Optimizing Generation with Grouped Beam Search
 #######################################################