diff --git a/docs/articles_en/about-openvino/release-notes-openvino.rst b/docs/articles_en/about-openvino/release-notes-openvino.rst index 16bfdab100ac2f..7268bd7d629b45 100644 --- a/docs/articles_en/about-openvino/release-notes-openvino.rst +++ b/docs/articles_en/about-openvino/release-notes-openvino.rst @@ -1,9 +1,10 @@ +OpenVINO Release Notes +============================= + .. meta:: :description: See what has changed in OpenVINO with the latest release, as well as all previous releases in this year's cycle. -OpenVINO Release Notes -============================= .. toctree:: :maxdepth: 1 @@ -14,7 +15,7 @@ OpenVINO Release Notes -2024.3 - 30 July 2024 +2024.3 - 31 July 2024 ############################# :doc:`System Requirements <./release-notes-openvino/system-requirements>` | :doc:`Release policy <./release-notes-openvino/release-policy>` | :doc:`Installation Guides <./../get-started/install-openvino>` @@ -23,21 +24,21 @@ OpenVINO Release Notes What's new +++++++++++++++++++++++++++++ -More Gen AI coverage and framework integrations to minimize code changes. +* More Gen AI coverage and framework integrations to minimize code changes. -* OpenVINO pre-optimized models are now available in Hugging Face making it easier for developers - to get started with these models. + * OpenVINO pre-optimized models are now available in Hugging Face making it easier for developers + to get started with these models. -Broader Large Language Model (LLM) support and more model compression techniques. +* Broader Large Language Model (LLM) support and more model compression techniques. -* Significant improvement in LLM performance on Intel built-in and discrete GPUs with the addition - of dynamic quantization, Multi-Head Attention (MHA), and OneDNN enhancements. + * Significant improvement in LLM performance on Intel discrete GPUs with the addition of + Multi-Head Attention (MHA) and OneDNN enhancements. -More portability and performance to run AI at the edge, in the cloud, or locally. +* More portability and performance to run AI at the edge, in the cloud, or locally. -* Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching - in the OpenVINO Model Server (OVMS). vLLM is an easy-to-use open-source library that supports - efficient LLM inferencing and model serving. + * Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching + in the OpenVINO Model Server (OVMS). vLLM is an easy-to-use open-source library that supports + efficient LLM inferencing and model serving. @@ -59,7 +60,7 @@ Common * Increasing support for models like YoloV10 or PixArt-XL-2, thanks to enabling Squeeze and Concat layers. - * Performance of precision conversion fp16/bf16 -> fp32. + * Performance of precision conversion FP16/BF16 -> FP32. @@ -97,9 +98,6 @@ GPU Device Plugin * LLMs and Stable Diffusion on discrete GPUs, due to latency decrease, through optimizations such as Multi-Head Attention (MHA) and oneDNN improvements. - * First token latency of LLMs for large input cases on Core Ultra integrated GPU. It can be - further improved with dynamic quantization enabled with an application - `interface `__. * Whisper models on discrete GPU. @@ -191,7 +189,7 @@ Neural Network Compression Framework Act->MatMul and Act->MUltiply->MatMul to cover the Phi family models. * The representation of symmetrically quantized weights has been updated to a signed data type with no zero point. This allows NPU to support compressed LLMs with the symmetric mode. -* bf16 models in Post-Training Quantization are now supported; nncf.quantize(). +* BF16 models in Post-Training Quantization are now supported; nncf.quantize(). * `Activation Sparsity `__ (Contextual Sparsity) algorithm in the Weight Compression method is now supported (preview), speeding up LLM inference. The algorithm is enabled by setting the ``target_sparsity_by_scope`` option in @@ -431,7 +429,7 @@ Previous 2024 releases compression of LLMs. Enabled by `gptq=True`` in nncf.compress_weights(). * Scale Estimation algorithm for more accurate 4-bit compressed LLMs. Enabled by `scale_estimation=True`` in nncf.compress_weights(). - * Added support for models with bf16 weights in nncf.compress_weights(). + * Added support for models with BF16 weights in nncf.compress_weights(). * nncf.quantize() method is now the recommended path for quantization initialization of PyTorch models in Quantization-Aware Training. See example for more details. * compressed_model.nncf.get_config() and nncf.torch.load_from_config() API have been added to diff --git a/docs/sphinx_setup/_static/js/graphs.js b/docs/sphinx_setup/_static/js/graphs.js index ee2e412a0ba04e..4fa50da99ce94b 100644 --- a/docs/sphinx_setup/_static/js/graphs.js +++ b/docs/sphinx_setup/_static/js/graphs.js @@ -65,7 +65,7 @@ class Filter { // param: GraphData[], ieType static FilterByIeType(graphDataArr, value) { - return graphDataArr.filter((data) => data.ieType.includes(value)); + return graphDataArr.filter((data) => data.ieType && data.ieType.includes(value)); } // param: GraphData[], clientPlatforms[] diff --git a/docs/sphinx_setup/index.rst b/docs/sphinx_setup/index.rst index 54a5bb7cf2f0df..4b3c376152c860 100644 --- a/docs/sphinx_setup/index.rst +++ b/docs/sphinx_setup/index.rst @@ -5,11 +5,11 @@ OpenVINO 2024.3 .. meta:: :google-site-verification: _YqumYQ98cmXUTwtzM_0WIIadtDc6r_TMYGbmGgNvrk -**OpenVINO is an open-source toolkit** for optimizing and deploying deep learning models from cloud -to edge. It accelerates deep learning inference across various use cases, such as generative AI, video, -audio, and language with models from popular frameworks like PyTorch, TensorFlow, ONNX, and more. -Convert and optimize models, and deploy across a mix of IntelĀ® hardware and environments, on-premises -and on-device, in the browser or in the cloud. +**OpenVINO is an open-source toolkit** for optimizing and deploying deep learning models from +cloud to edge. It accelerates deep learning inference across various use cases, such as +generative AI, video, audio, and language with models from popular frameworks like PyTorch, +TensorFlow, ONNX, and more. Convert and optimize models, and deploy across a mix of IntelĀ® +hardware and environments, on-premises and on-device, in the browser or in the cloud. Check out the `OpenVINO Cheat Sheet. `__ @@ -26,16 +26,21 @@ Check out the `OpenVINO Cheat Sheet.
  • +

    OpenVINO models on Hugging Face!

    +

    Get pre-optimized OpenVINO models, no need to convert!

    + Visit Hugging Face +
  • +
  • New Generative AI API

    Generate text with LLMs in only a few lines of code!

    Check out our guide
  • -
  • +
  • Improved model serving

    OpenVINO Model Server has improved parallel inferencing!

    Learn more
  • -
  • +
  • OpenVINO via PyTorch 2.0 torch.compile()

    Use OpenVINO directly in PyTorch-native applications!

    Learn more