Skip to content

Commit

Permalink
Merge branch 'master' into support_offline_cpu
Browse files Browse the repository at this point in the history
  • Loading branch information
wangleis authored Dec 17, 2024
2 parents 6937385 + 2c2ac3d commit 3a48813
Show file tree
Hide file tree
Showing 6 changed files with 13 additions and 72 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,27 +17,30 @@ The tables below list the key performance indicators for inference on built-in G

.. tab-item:: 9-288V

.. csv-table::
.. data-table::
:class: modeldata stripe
:name: supportedModelsTable_V1
:header-rows: 1
:file: ../../_static/benchmarks_files/llm_models_9-288V.csv
:hidden: [3,4,6]

.. tab-item:: 7-268V

.. csv-table::
.. data-table::
:class: modeldata stripe
:name: supportedModelsTable_V2
:header-rows: 1
:file: ../../_static/benchmarks_files/llm_models_7-258V.csv
:hidden: [3,4,6]

.. tab-item:: 7-155H

.. csv-table::
.. data-table::
:class: modeldata stripe
:name: supportedModelsTable_V3
:header-rows: 1
:file: ../../_static/benchmarks_files/llm_models_7-155H.csv
:hidden: [3,4,6]


.. grid:: 1 1 2 2
Expand Down
5 changes: 3 additions & 2 deletions docs/articles_en/about-openvino/release-notes-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,9 @@ OpenVINO™ Runtime
CPU Device Plugin
-----------------------------------------------------------------------------------------------

* KV cache now uses asymmetric U8 as the default precision, reducing memory stress for LLMs and
increasing their performance. This option can be controlled by model meta data.
* KV cache now uses asymmetric 8-bit unsigned integer (U8) as the default precision, reducing
memory stress for LLMs and increasing their performance. This option can be controlled by
model meta data.
* Quality and accuracy has been improved for selected models with several bug fixes.

GPU Device Plugin
Expand Down
30 changes: 1 addition & 29 deletions docs/articles_en/assets/snippets/ov_caching.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,42 +90,14 @@ auto compiled = core.compile_model(model, device, config); // Step 5:
}
}

void part5() {
std::string modelPath = "/tmp/myModel.xml";
std::string device = "GPU";
ov::Core core; // Step 1: create ov::Core object
core.set_property(ov::cache_dir("/path/to/cache/dir")); // Step 1b: Enable caching
//! [ov:caching:part5]
static const char codec_key[] = {0x30, 0x60, 0x70, 0x02, 0x04, 0x08, 0x3F, 0x6F, 0x72, 0x74, 0x78, 0x7F};
auto codec_xor = [&](const std::string& source_str) {
auto key_size = sizeof(codec_key);
int key_idx = 0;
std::string dst_str = source_str;
for (char& c : dst_str) {
c ^= codec_key[key_idx % key_size];
key_idx++;
}
return dst_str;
};
auto compiled = core.compile_model(modelPath,
device,
ov::cache_encryption_callbacks(ov::EncryptionCallbacks{codec_xor, codec_xor}),
ov::cache_mode(ov::CacheMode::OPTIMIZE_SIZE)); // Step 5: Compile model
//! [ov:caching:part5]
if (!compiled) {
throw std::runtime_error("error");
}
}

int main() {
try {
part0();
part1();
part2();
part3();
part4();
part5();
} catch (...) {
}
return 0;
}
}
17 changes: 0 additions & 17 deletions docs/articles_en/assets/snippets/ov_caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,20 +59,3 @@ def decrypt_base64(src):
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model=model, device_name=device_name, config=config_cache)
# ! [ov:caching:part4]

# ! [ov:caching:part5]
import base64

def encrypt_base64(src):
return base64.b64encode(bytes(src, "utf-8"))

def decrypt_base64(src):
return base64.b64decode(bytes(src, "utf-8"))

core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
config_cache = {}
config_cache["CACHE_ENCRYPTION_CALLBACKS"] = [encrypt_base64, decrypt_base64]
config_cache["CACHE_MODE"] = "OPTIMIZE_SIZE"
compiled_model = core.compile_model(model=model_path, device_name='GPU', config=config_cache)
# ! [ov:caching:part5]
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ To check in advance if a particular device supports model caching, your applicat
Set "cache_encryption_callbacks" config option to enable cache encryption
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

If model caching is enabled in the CPU Plugin, the model topology can be encrypted while it is saved to the cache and decrypted when it is loaded from the cache. Currently, this property can be set only in ``compile_model``.
If model caching is enabled, the model topology can be encrypted when saving to the cache and decrypted when loading from the cache. This property can currently be set only in ``compile_model``.

.. tab-set::

Expand All @@ -157,24 +157,6 @@ If model caching is enabled in the CPU Plugin, the model topology can be encrypt
:language: cpp
:fragment: [ov:caching:part4]

If model caching is enabled in the GPU Plugin, the model topology can be encrypted while it is saved to the cache and decrypted when it is loaded from the cache. Full encryption only works when the ``CacheMode`` property is set to ``OPTIMIZE_SIZE``.

.. tab-set::

.. tab-item:: Python
:sync: py

.. doxygensnippet:: docs/articles_en/assets/snippets/ov_caching.py
:language: py
:fragment: [ov:caching:part5]

.. tab-item:: C++
:sync: cpp

.. doxygensnippet:: docs/articles_en/assets/snippets/ov_caching.cpp
:language: cpp
:fragment: [ov:caching:part5]

.. important::

Currently, this property is supported only by the CPU and GPU plugins. For other HW plugins, setting this property will not encrypt/decrypt the model topology in cache and will not affect performance.
Currently, this property is supported only by the CPU plugin. For other HW plugins, setting this property will not encrypt/decrypt the model topology in cache and will not affect performance.
2 changes: 1 addition & 1 deletion docs/sphinx_setup/_static/js/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ document.addEventListener('DOMContentLoaded', function () {
}

await element.initialize({
accessToken: "xx1f2aebd3-4307-4632-aeea-17c13378b237",
accessToken: "xx2b580d60-addf-451d-94fd-06effafb7686",
organizationId: "intelcorporationproductione78n25s6"
});

Expand Down

0 comments on commit 3a48813

Please sign in to comment.