Skip to content

Commit

Permalink
Merge branch 'master' into tj/rt_info/serialize-rt_map
Browse files Browse the repository at this point in the history
  • Loading branch information
mlukasze authored Dec 16, 2024
2 parents cd48083 + 357eb54 commit 00bdf12
Show file tree
Hide file tree
Showing 61 changed files with 1,606 additions and 683 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/job_python_api_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,10 @@ jobs:
--junitxml=${INSTALL_TEST_DIR}/TEST-Pyngraph.xml \
--ignore=${INSTALL_TEST_DIR}/tests/pyopenvino/tests/test_utils/test_utils.py
- name: Python API Tests -- numpy>=2.0.0
- name: Python API Tests -- numpy<2.0.0
run: |
python3 -m pip uninstall -y numpy
python3 -m pip install "numpy~=2.0.0"
python3 -m pip install "numpy~=1.26.0"
python3 -m pip install -r ${INSTALL_TEST_DIR}/tests/bindings/python/requirements_test.txt
# for 'template' extension
export LD_LIBRARY_PATH=${INSTALL_TEST_DIR}/tests/:$LD_LIBRARY_PATH
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ For setting up a relevant configuration, refer to the
:doc:`Integrate with Customer Application <../../openvino-workflow/running-inference/integrate-openvino-with-your-application>`
topic (step 3 "Configure input and output").

.. dropdown:: Device support across OpenVINO 2024.5 distributions
.. dropdown:: Device support across OpenVINO 2024.6 distributions

=============== ========== ====== =============== ======== ============ ========== ========== ==========
Device Archives PyPI APT/YUM/ZYPPER Conda Homebrew vcpkg Conan npm
Expand Down
626 changes: 336 additions & 290 deletions docs/articles_en/about-openvino/release-notes-openvino.rst

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/articles_en/documentation/openvino-extensibility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ The first part is required for inference. The second part is required for succes
Definition of Operation Semantics
#################################

If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. Refer to the latest OpenVINO operation set, when deciding feasibility of such decomposition. You can use any valid combination of exiting operations. The next section of this document describes the way to map a custom operation.
If the custom operation can be mathematically represented as a combination of existing OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. Refer to the latest OpenVINO operation set, when deciding feasibility of such decomposition. You can use any valid combination of existing operations. The next section of this document describes the way to map a custom operation.

If such decomposition is not possible or appears too bulky with a large number of constituent operations that do not perform well, then a new class for the custom operation should be implemented, as described in the :doc:`Custom Operation Guide <openvino-extensibility/custom-openvino-operations>`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ OpenVINO™ GenAI Dependencies
OpenVINO™ GenAI depends on both `OpenVINO <https://github.com/openvinotoolkit/openvino>`__ and
`OpenVINO Tokenizers <https://github.com/openvinotoolkit/openvino_tokenizers>`__. During OpenVINO™
GenAI installation from PyPi, the same versions of OpenVINO and OpenVINO Tokenizers
are used (e.g. ``openvino==2024.5.0`` and ``openvino-tokenizers==2024.5.0.0`` are installed for
``openvino-genai==2024.5.0``).
are used (e.g. ``openvino==2024.6.0`` and ``openvino-tokenizers==2024.6.0.0`` are installed for
``openvino-genai==2024.6.0``).

Trying to update any of the dependency packages might result in a version incompatiblibty
Trying to update any of the dependency packages might result in a version incompatibility
due to different Application Binary Interfaces (ABIs), which will result in errors while running
OpenVINO GenAI. Having package version in the ``<MAJOR>.<MINOR>.<PATCH>.<REVISION>`` format, allows
OpenVINO GenAI. Having package version in the ``<MAJOR>.<MINOR>.<PATCH>.<REVISION>`` format, enables
changing the ``<REVISION>`` portion of the full version to ensure ABI compatibility. Changing
``<MAJOR>``, ``<MINOR>`` or ``<PATCH>`` part of the version may break ABI.

Expand Down
7 changes: 4 additions & 3 deletions docs/articles_en/get-started/install-openvino.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Install OpenVINO™ 2024.5
Install OpenVINO™ 2024.6
==========================


Expand All @@ -23,10 +23,11 @@ Install OpenVINO™ 2024.5
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<iframe id="selector" src="../_static/selector-tool/selector-2a63478.html" style="width: 100%; border: none" title="Download Intel® Distribution of OpenVINO™ Toolkit"></iframe>

OpenVINO 2024.5, described here, is not a Long-Term-Support version!
OpenVINO 2024.6, described here, is a Long-Term-Support version!
All currently supported versions are:

* 2024.5 (development)
* 2025.0 (in development)
* 2024.6 (LTS)
* 2023.3 (LTS)


Expand Down
63 changes: 25 additions & 38 deletions docs/articles_en/learn-openvino/llm_inference_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ Generative AI workflow
Generative AI is a specific area of Deep Learning models used for producing new and “original”
data, based on input in the form of image, sound, or natural language text. Due to their
complexity and size, generative AI pipelines are more difficult to deploy and run efficiently.
OpenVINO simplifies the process and ensures high-performance integrations, with the following
OpenVINO simplifies the process and ensures high-performance integrations, with the following
options:

.. tab-set::

.. tab-item:: OpenVINO GenAI
.. tab-item:: OpenVINO GenAI

| - Suggested for production deployment for the supported use cases.
| - Smaller footprint and fewer dependencies.
Expand All @@ -39,6 +39,8 @@ options:
text generation loop, tokenization, and scheduling, offering ease of use and high
performance.

`Check out the OpenVINO GenAI Quick-start Guide [PDF] <https://docs.openvino.ai/nightly/_static/download/GenAI_Quick_Start_Guide.pdf>`__

.. tab-item:: Hugging Face integration

| - Suggested for prototyping and, if the use case is not covered by OpenVINO GenAI, production.
Expand All @@ -54,49 +56,34 @@ options:
as well as conversion on the fly. For integration with the final product it may offer
lower performance, though.

`Check out the GenAI Quick-start Guide [PDF] <https://docs.openvino.ai/2024/_static/download/GenAI_Quick_Start_Guide.pdf>`__

The advantages of using OpenVINO for LLM deployment:

.. dropdown:: Fewer dependencies and smaller footprint
:animate: fade-in-slide-down
:color: secondary

Less bloated than frameworks such as Hugging Face and PyTorch, with a smaller binary size and reduced
memory footprint, makes deployments easier and updates more manageable.

.. dropdown:: Compression and precision management
:animate: fade-in-slide-down
:color: secondary

Techniques such as 8-bit and 4-bit weight compression, including embedding layers, and storage
format reduction. This includes fp16 precision for non-compressed models and int8/int4 for
compressed models, like GPTQ models from `Hugging Face <https://huggingface.co/models>`__.

.. dropdown:: Enhanced inference capabilities
:animate: fade-in-slide-down
:color: secondary
The advantages of using OpenVINO for generative model deployment:

Advanced features like in-place KV-cache, dynamic quantization, KV-cache quantization and
encapsulation, dynamic beam size configuration, and speculative sampling, and more are
available.
| **Fewer dependencies and smaller footprint**
| Less bloated than frameworks such as Hugging Face and PyTorch, with a smaller binary size and reduced
memory footprint, makes deployments easier and updates more manageable.
.. dropdown:: Stateful model optimization
:animate: fade-in-slide-down
:color: secondary
| **Compression and precision management**
| Techniques such as 8-bit and 4-bit weight compression, including embedding layers, and storage
format reduction. This includes fp16 precision for non-compressed models and int8/int4 for
compressed models, like GPTQ models from `Hugging Face <https://huggingface.co/models>`__.
Models from the Hugging Face Transformers are converted into a stateful form, optimizing
inference performance and memory usage in long-running text generation tasks by managing past
KV-cache tensors more efficiently internally. This feature is automatically activated for
many supported models, while unsupported ones remain stateless. Learn more about the
:doc:`Stateful models and State API <../openvino-workflow/running-inference/stateful-models>`.
| **Enhanced inference capabilities**
| Advanced features like in-place KV-cache, dynamic quantization, KV-cache quantization and
encapsulation, dynamic beam size configuration, and speculative sampling, and more are
available.
.. dropdown:: Optimized LLM inference
:animate: fade-in-slide-down
:color: secondary
| **Stateful model optimization**
| Models from the Hugging Face Transformers are converted into a stateful form, optimizing
inference performance and memory usage in long-running text generation tasks by managing past
KV-cache tensors more efficiently internally. This feature is automatically activated for
many supported models, while unsupported ones remain stateless. Learn more about the
:doc:`Stateful models and State API <../openvino-workflow/running-inference/stateful-models>`.
Includes a Python API for rapid development and C++ for further optimization, offering
better performance than Python-based runtimes.
| **Optimized LLM inference**
| Includes a Python API for rapid development and C++ for further optimization, offering
better performance than Python-based runtimes.

Proceed to guides on:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ make sure to :doc:`install OpenVINO with GenAI <../../get-started/install-openvi
.. dropdown:: Text-to-Image Generation

OpenVINO GenAI introduces the openvino_genai.Text2ImagePipeline for inference of text-to-image
models such as: as Stable Diffusion 1.5, 2.1, XL, LCM, Flex, and more.
See the following usage example for reference.

.. tab-set::

.. tab-item:: Python
Expand Down Expand Up @@ -130,7 +134,7 @@ make sure to :doc:`install OpenVINO with GenAI <../../get-started/install-openvi
image_write("baseline.bmp", image)
For more information, refer to the
`Python sample <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/text2image/>`__
`Python sample <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/image_generation>`__

.. tab-item:: C++
:sync: cpp
Expand Down Expand Up @@ -579,8 +583,9 @@ compression is done by NNCF at the model export stage. The exported model contai
information necessary for execution, including the tokenizer/detokenizer and the generation
config, ensuring that its results match those generated by Hugging Face.

The `LLMPipeline` is the main object used for decoding and handles all the necessary steps.
You can construct it directly from the folder with the converted model.
The `LLMPipeline` is the main object to setup the model for text generation. You can provide the
converted model to this object, specify the device for inference, and provide additional
parameters.


.. tab-set::
Expand Down Expand Up @@ -911,7 +916,7 @@ running the following code:
GenAI API
#######################################

The use case described here uses the following OpenVINO GenAI API methods:
The use case described here uses the following OpenVINO GenAI API classes:

* generation_config - defines a configuration class for text generation,
enabling customization of the generation process such as the maximum length of
Expand All @@ -921,7 +926,6 @@ The use case described here uses the following OpenVINO GenAI API methods:
text generation, and managing outputs with configurable options.
* streamer_base - an abstract base class for creating streamers.
* tokenizer - the tokenizer class for text encoding and decoding.
* visibility - controls the visibility of the GenAI library.

Learn more from the `GenAI API reference <https://docs.openvino.ai/2024/api/genai_api/api.html>`__.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Generative Model Preparation



Since generative AI models tend to be big and resource-heavy, it is advisable to store them
locally and optimize for efficient inference. This article will show how to prepare
Since generative AI models tend to be big and resource-heavy, it is advisable to
optimize them for efficient inference. This article will show how to prepare
LLM models for inference with OpenVINO by:

* `Downloading Models from Hugging Face <#download-generative-models-from-hugging-face-hub>`__
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/ov_dependencies.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#This file provides a comprehensive list of all dependencies of OpenVINO 2024.5
#This file provides a comprehensive list of all dependencies of OpenVINO 2024.6
#The file is part of the automation pipeline for posting OpenVINO IR models on the HuggingFace Hub, including OneBOM dependency checks.


Expand Down
Binary file modified docs/sphinx_setup/_static/download/GenAI_Quick_Start_Guide.pdf
Binary file not shown.
10 changes: 5 additions & 5 deletions docs/sphinx_setup/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,16 @@ hardware and environments, on-premises and on-device, in the browser or in the c
<section class="splide" aria-label="Splide Banner Carousel">
<div class="splide__track">
<ul class="splide__list">
<li id="ov-homepage-slide2" class="splide__slide">
<p class="ov-homepage-slide-title">New GenAI API</p>
<p class="ov-homepage-slide-subtitle">Generative AI in only a few lines of code!</p>
<a class="ov-homepage-banner-btn" href="https://docs.openvino.ai/nightly/learn-openvino/llm_inference_guide/genai-guide.html">Check out our guide</a>
</li>
<li id="ov-homepage-slide1" class="splide__slide">
<p class="ov-homepage-slide-title">OpenVINO models on Hugging Face!</p>
<p class="ov-homepage-slide-subtitle">Get pre-optimized OpenVINO models, no need to convert!</p>
<a class="ov-homepage-banner-btn" href="https://huggingface.co/OpenVINO">Visit Hugging Face</a>
</li>
<li id="ov-homepage-slide2" class="splide__slide">
<p class="ov-homepage-slide-title">New Generative AI API</p>
<p class="ov-homepage-slide-subtitle">Generate text with LLMs in only a few lines of code!</p>
<a class="ov-homepage-banner-btn" href="https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html">Check out our guide</a>
</li>
<li id="ov-homepage-slide3" class="splide__slide">
<p class="ov-homepage-slide-title">Improved model serving</p>
<p class="ov-homepage-slide-subtitle">OpenVINO Model Server has improved parallel inferencing!</p>
Expand Down
13 changes: 12 additions & 1 deletion src/bindings/js/node/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,17 @@
"host": "https://storage.openvinotoolkit.org"
},
"keywords": [
"OpenVINO"
"OpenVINO",
"openvino",
"openvino-node",
"openvino npm",
"openvino binding",
"openvino node.js",
"openvino library",
"intel openvino",
"openvino toolkit",
"openvino API",
"openvino SDK",
"openvino integration"
]
}
2 changes: 1 addition & 1 deletion src/bindings/python/constraints.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# used in multiple components
numpy>=1.16.6,<2.2.0 # Python bindings, frontends
numpy>=1.16.6,<2.3.0 # Python bindings, frontends

# pytest
pytest>=5.0,<8.4
Expand Down
2 changes: 1 addition & 1 deletion src/bindings/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
numpy>=1.16.6,<2.2.0
numpy>=1.16.6,<2.3.0
openvino-telemetry>=2023.2.1
packaging
15 changes: 8 additions & 7 deletions src/bindings/python/src/openvino/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@
from openvino import properties as properties

# Import most important classes and functions from openvino.runtime
from openvino.runtime import Model
from openvino.runtime import Core
from openvino.runtime import CompiledModel
from openvino.runtime import InferRequest
from openvino.runtime import AsyncInferQueue
from openvino._ov_api import Model
from openvino._ov_api import Core
from openvino._ov_api import CompiledModel
from openvino._ov_api import InferRequest
from openvino._ov_api import AsyncInferQueue

from openvino.runtime import Symbol
from openvino.runtime import Dimension
Expand All @@ -43,12 +43,13 @@
from openvino.runtime import Tensor
from openvino.runtime import OVAny

from openvino.runtime import compile_model
# Helper functions for openvino module
from openvino.runtime.utils.data_helpers import tensor_from_file
from openvino._ov_api import compile_model
from openvino.runtime import get_batch
from openvino.runtime import set_batch
from openvino.runtime import serialize
from openvino.runtime import shutdown
from openvino.runtime import tensor_from_file
from openvino.runtime import save_model
from openvino.runtime import layout_helpers

Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion src/bindings/python/src/openvino/opset8/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from typing import List, Optional, Tuple

import numpy as np
from openvino.runtime.exceptions import UserInputError
from openvino.exceptions import UserInputError
from openvino.op import Constant, Parameter, if_op
from openvino.runtime import Node
from openvino.runtime.opset_utils import _get_node_factory
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2018-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from openvino.exceptions import OVError
from openvino.exceptions import UserInputError
from openvino.exceptions import OVTypeError
12 changes: 12 additions & 0 deletions src/bindings/python/src/openvino/runtime/ie_api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2018-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from openvino._ov_api import Core
from openvino._ov_api import CompiledModel
from openvino._ov_api import InferRequest
from openvino._ov_api import Model
from openvino._ov_api import AsyncInferQueue

from openvino._ov_api import tensor_from_file
from openvino._ov_api import compile_model
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ static void regclass_graph_PreProcessSteps(py::module m) {
:param pads_end: Number of elements matches the number of indices in data attribute. Specifies the number of padding elements at the ending of each axis.
:type pads_end: 1D tensor of type T_INT.
:param value: All new elements are populated with this value or with 0 if input not provided. Shouldn’t be set for other pad_mode values.
:type value: scalar tensor of type T.
:type value: scalar tensor of type T.
:param mode: pad_mode specifies the method used to generate new element values.
:type mode: string
:return: Reference to itself, allows chaining of calls in client's code in a builder-like manner.
Expand Down Expand Up @@ -219,7 +219,7 @@ static void regclass_graph_PreProcessSteps(py::module m) {
:param pads_end: Number of elements matches the number of indices in data attribute. Specifies the number of padding elements at the ending of each axis.
:type pads_end: 1D tensor of type T_INT.
:param value: All new elements are populated with this value or with 0 if input not provided. Shouldn’t be set for other pad_mode values.
:type value: scalar tensor of type T.
:type value: scalar tensor of type T.
:param mode: pad_mode specifies the method used to generate new element values.
:type mode: string
:return: Reference to itself, allows chaining of calls in client's code in a builder-like manner.
Expand Down Expand Up @@ -308,7 +308,8 @@ static void regclass_graph_InputTensorInfo(py::module m) {
},
py::arg("layout"),
R"(
Set layout for input tensor info
Set layout for input tensor info
:param layout: layout to be set
:type layout: Union[str, openvino.runtime.Layout]
)");
Expand Down Expand Up @@ -422,7 +423,8 @@ static void regclass_graph_OutputTensorInfo(py::module m) {
},
py::arg("layout"),
R"(
Set layout for output tensor info
Set layout for output tensor info
:param layout: layout to be set
:type layout: Union[str, openvino.runtime.Layout]
)");
Expand Down Expand Up @@ -475,7 +477,8 @@ static void regclass_graph_OutputModelInfo(py::module m) {
},
py::arg("layout"),
R"(
Set layout for output model info
Set layout for output model info
:param layout: layout to be set
:type layout: Union[str, openvino.runtime.Layout]
)");
Expand Down
Loading

0 comments on commit 00bdf12

Please sign in to comment.