Skip to content

Commit

Permalink
Readme draft (#930)
Browse files Browse the repository at this point in the history
Updates to describe:
- latest changes in library with focus on GenAI api
- expose more functionality that we have (continuous batching,
speculative decoding, etc.)
- provide lightweight samples as a beginning

---------

Co-authored-by: Ilya Lavrenov <[email protected]>
  • Loading branch information
yury-gorbachev and ilya-lavrenov authored Oct 9, 2024
1 parent 6d2763a commit 5018f73
Show file tree
Hide file tree
Showing 3 changed files with 302 additions and 45 deletions.
67 changes: 67 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
###############################################################################
# Set default behavior to automatically normalize line endings.
###############################################################################
* text=auto
###############################################################################
# Set default behavior for command prompt diff.
#
# This is need for earlier builds of msysgit that does not have it on by
# default for csharp files.
# Note: This is only used by command line
###############################################################################
#*.cs diff=csharp
*.py text eol=lf
###############################################################################
# Set the merge driver for project and solution files
#
# Merging from the command prompt will add diff markers to the files if there
# are conflicts (Merging from VS is not affected by the settings below, in VS
# the diff markers are never inserted). Diff markers may cause the following
# file extensions to fail to load in VS. An alternative would be to treat
# these files as binary and thus will always conflict and require user
# intervention with every merge. To do so, just uncomment the entries below
###############################################################################
#*.sln merge=binary
#*.csproj merge=binary
#*.vbproj merge=binary
#*.vcxproj merge=binary
#*.vcproj merge=binary
#*.dbproj merge=binary
#*.fsproj merge=binary
#*.lsproj merge=binary
#*.wixproj merge=binary
#*.modelproj merge=binary
#*.sqlproj merge=binary
#*.wwaproj merge=binary
###############################################################################
# behavior for image files
#
# image files are treated as binary by default.
###############################################################################
#*.jpg binary
#*.png binary
#*.gif binary
###############################################################################
# diff behavior for common document formats
#
# Convert binary document formats to text before diffing them. This feature
# is only available from the command line. Turn it on by uncommenting the
# entries below.
###############################################################################
#*.doc diff=astextplain
#*.DOC diff=astextplain
#*.docx diff=astextplain
#*.DOCX diff=astextplain
#*.dot diff=astextplain
#*.DOT diff=astextplain
#*.pdf diff=astextplain
#*.PDF diff=astextplain
#*.rtf diff=astextplain
#*.RTF diff=astextplain
*.PNG filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.vsdx filter=lfs diff=lfs merge=lfs -text
*.bmp filter=lfs diff=lfs merge=lfs -text
*.svg filter=lfs diff=lfs merge=lfs -text
280 changes: 235 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,240 @@
# OpenVINO™ GenAI

The OpenVINO™ GenAI repository consists of the GenAI library and additional GenAI samples.

## OpenVINO™ GenAI Library

OpenVINO™ GenAI is a flavor of OpenVINO, aiming to simplify running inference of generative AI models.
It hides the complexity of the generation process and minimizes the amount of code required.

For installation and usage instructions, refer to the [GenAI Library README](./src/README.md).

## OpenVINO™ GenAI Samples

The OpenVINO™ GenAI repository contains pipelines that implement image and text generation tasks.
The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers
a family of models and suggests certain modifications to adapt the code to specific needs.
It includes the following pipelines:

1. [Benchmarking script for large language models](./llm_bench/python/README.md)
2. Text generation samples that support most popular models like LLaMA 2:
- Python:
1. [beam_search_causal_lm](./samples/python/beam_search_causal_lm/README.md)
1. [benchmark_genai](./samples/python/benchmark_genai/README.md)
2. [chat_sample](./samples/python/chat_sample/README.md)
3. [greedy_causal_lm](./samples/python/greedy_causal_lm/README.md)
4. [multinomial_causal_lm](./samples/python/multinomial_causal_lm/README.md)
- C++:
1. [beam_search_causal_lm](./samples/cpp/beam_search_causal_lm/README.md)
1. [benchmark_genai](./samples/cpp/benchmark_genai/README.md)
2. [chat_sample](./samples/cpp/chat_sample/README.md)
3. [continuous_batching_accuracy](./samples/cpp/continuous_batching_accuracy)
4. [continuous_batching_benchmark](./samples/cpp/continuous_batching_benchmark)
5. [greedy_causal_lm](./samples/cpp/greedy_causal_lm/README.md)
6. [multinomial_causal_lm](./samples/cpp/multinomial_causal_lm/README.md)
7. [prompt_lookup_decoding_lm](./samples/cpp/prompt_lookup_decoding_lm/README.md)
8. [speculative_decoding_lm](./samples/cpp/speculative_decoding_lm/README.md)
3. [Stable Diffuison and Latent Consistency Model (with LoRA) C++ image generation pipeline](./samples/cpp/text2image/README.md)

### Requirements

Requirements may vary for different samples. See respective readme files for more details,
and make sure to install the OpenVINO version listed there. Refer to documentation to see
[how to install OpenVINO](https://docs.openvino.ai/install).

The supported devices are CPU and GPU including Intel discrete GPU.

See also: https://docs.openvino.ai/2023.3/gen_ai_guide.html.
OpenVINO™ GenAI is a library of most popular Generative AI model pipelines, optimized execution methods and samples that runs on top of highly performant [OpenVINO Runtime](https://github.com/openvinotoolkit/openvino).

Library is friendly to PC and laptop execution, optimized for resource consumption and requires no external dependencies to run generative models and includes all required functionality (e.g. tokenization via openvino-tokenizers).

![Text generation using LLaMa 3.2 model running on Intel ARC770 dGPU](./samples/generation.gif)

## Supported Generative AI scenarios

OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run following Generative Scenarios:
- Text generation using Large Language Models. For example, chat with local LLaMa model
- Image generation using Diffuser models, for example generation using Stable Diffusion models
- Speech recognition using Whisper family models
- Text generation using Large Visual Models, for instance Image analysis using LLaVa or miniCPM models family

Library efficiently supports LoRA adapters for Text and Image generation scenarios:
- Load multiple adapters per model
- Select active adapters for every generation
- Mix multiple adapters with coefficients via alpha blending

All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix.

## Supported Generative AI optimization methods

OpenVINO™ GenAI library provides transparent way to use state of the art generation optimizations:
- Speculative decoding that employs two models of different size and uses large model to periodically correct results of small model. See [here](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) for more detailed overview
- KVCache token eviction algorithm that reduces size of the KVCache by pruning less impacting tokens.

Additionally, OpenVINO™ GenAI library implements continuous batching approach to use OpenVINO within LLM serving. Continuous batching library could be used in LLM serving frameworks and supports following features:
- Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query. See [here](https://google.com) for more detailed overview

Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2024/ovms_docs_llm_reference.html) for more details.

## Installing OpenVINO GenAI

```sh
# Installing OpenVINO GenAI via pip
pip install openvino-genai

# Install optimum-intel to be able to download, convert and optimize LLMs from Hugging Face
# Optimum is not required to run models, only to convert and compress
pip install optimum[openvino]

# (Optional) Install (TBD) to be able to download models from Model Scope
#pip install optimum[openvino]
```

## Performing text generation
<details>
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)

### Converting and compressing text generation model from Hugging Face library

```sh
#(Basic) download and convert to OpenVINO TinyLlama-Chat-v1.0 model
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format fp16 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"

#(Recommended) download, convert to OpenVINO and compress to int4 TinyLlama-Chat-v1.0 model
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
```

### Run generation using LLMPipeline API in Python

```python
import openvino_genai as ov_genai
#Will run model on CPU, GPU or NPU are possible options
pipe = ov_genai.LLMPipeline("./TinyLlama-1.1B-Chat-v1.0/", "CPU")
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
```

### Run generation using LLM Pipeline in C++

Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)

```cpp
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
}
```
### Sample notebooks using this API
(TBD)
</details>
## Performing image generation
<details>
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
### Converting and compressing image generation model from Hugging Face library
```sh
#Download and convert to OpenVINO dreamlike-anime-1.0 model
optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16
```

### Run generation using Text2Image API in Python

```python

#WIP

```

### Run generation using Text2Image API in C++

Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)

```cpp
#include "openvino/genai/text2image/pipeline.hpp"
#include "imwrite.hpp"
int main(int argc, char* argv[]) {

const std::string models_path = argv[1], prompt = argv[2];
const std::string device = "CPU"; // GPU, NPU can be used as well

ov::genai::Text2ImagePipeline pipe(models_path, device);
ov::Tensor image = pipe.generate(prompt,
ov::genai::width(512),
ov::genai::height(512),
ov::genai::num_inference_steps(20));

imwrite("image.bmp", image, true);
}
```
### Sample notebooks using this API
(TBD)
</details>
## Speech to text processing using Whisper Pipeline
<details>
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
NOTE: Whisper Pipeline requires preprocessing of audio input (to adjust sampling rate and normalize)
### Converting and compressing image generation model from Hugging Face library
```sh
#Download and convert to OpenVINO whisper-base model
optimum-cli export openvino --trust-remote-code --model openai/whisper-base whisper-base
```

### Run generation using Whisper Pipeline API in Python

NOTE: this sample is simplified version of full sample that is available [here](./samples/python/whisper_speech_recognition/whisper_speech_recognition.py)

```python
import argparse
import openvino_genai
import librosa

def read_wav(filepath):
raw_speech, samplerate = librosa.load(filepath, sr=16000)
return raw_speech.tolist()

def main():
parser = argparse.ArgumentParser()
parser.add_argument("model_dir")
parser.add_argument("wav_file_path")
args = parser.parse_args()

raw_speech = read_wav(args.wav_file_path)

pipe = openvino_genai.WhisperPipeline(args.model_dir)

def streamer(word: str) -> bool:
print(word, end="")
return False

pipe.generate(
raw_speech,
max_new_tokens=100,
# 'task' and 'language' parameters are supported for multilingual models only
language="<|en|>",
task="transcribe",
streamer=streamer,
)

print()
```


### Run generation using Whisper Pipeline API in C++

NOTE: this sample is simplified version of full sample that is available [here](./samples/cpp/whisper_speech_recognition/whisper_speech_recognition.cpp)

```cpp
#include "audio_utils.hpp"
#include "openvino/genai/whisper_pipeline.hpp"

int main(int argc, char* argv[]) try {

std::string model_path = argv[1];
std::string wav_file_path = argv[2];

ov::genai::RawSpeechInput raw_speech = utils::audio::read_wav(wav_file_path);

ov::genai::WhisperPipeline pipeline{model_path};

ov::genai::WhisperGenerationConfig config{model_path + "/generation_config.json"};
config.max_new_tokens = 100;
// 'task' and 'language' parameters are supported for multilingual models only
config.language = "<|en|>";
config.task = "transcribe";

auto streamer = [](std::string word) {
std::cout << word;
return false;
};

pipeline.generate(raw_speech, config, streamer);

std::cout << std::endl;
}
```
### Sample notebooks using this API
(TBD)
</details>
## Additional materials
- [List of supported models](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md) (NOTE: models can work, but were not tried yet)
- [OpenVINO LLM inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
- [Optimum-intel and OpenVINO](https://huggingface.co/docs/optimum/intel/openvino/export)
## License
Expand Down
Binary file added samples/generation.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5018f73

Please sign in to comment.