[CPU] Add interface to release compiled model internal memory #26390

maxnick · 2024-09-03T12:08:12Z

Details:

This PR introduces an ov::CompiledModel level interface that allows to release memory allocated by the compiled model. In this PR the interface is only supported by the CPU plugin.

Tickets:

CVS-145873

…terface_20244

ilya-lavrenov

for future releases, we need to add behavior tests for this feature:

release_memory throws when called during running inferences
inference request can execute w/o issues after release_meomry was called

ilya-lavrenov · 2024-09-03T17:40:39Z

src/plugins/intel_cpu/src/compiled_model.cpp

+    for (auto&& graph : m_graphs) {
+        GraphGuard::Lock graph_lock{graph};
+        auto ctx = graph_lock._graph.getGraphContext();
+        ctx->getNetworkMemoryControl()->releaseMemory();


can we release memory only for graphs which don't have running inference requests?

In that case we would have to add a delayed release for the others. I'm not sure we are really want to have nonuniform memory release across streams.

dmitry-gorokhov

No blocking comments from my side

dmitry-gorokhov · 2024-09-03T12:55:22Z

src/inference/include/openvino/runtime/compiled_model.hpp

+    /**
+     * @brief Release intermediate memory.
+     *
+     * This methods forces the Compiled model to release memory allocated for intermediate structures, e.g. caches,


methods -> method

dmitry-gorokhov · 2024-09-04T05:44:26Z

src/plugins/intel_cpu/src/nodes/multinomial.cpp

@@ -77,6 +77,14 @@ bool Multinomial::needPrepareParams() const {
    return true;
 }

+void Multinomial::createPrimitive() {
+    if (!m_const_inputs[NUM_SAMPLES_PORT]) {
+        CPU_NODE_ASSERT(isDynamicNode(), "is static while the samples input is a variable");


Cannot we just move m_samples_count computation into execute()? I suppose it shouldn't introduce any overheads.

Probably yes. Actually I'm going to prepare a list of leftovers for this PR, which then have to be put to the backlog and planned accordingly, so I'll include this one too.

dmitry-gorokhov · 2024-09-04T06:03:49Z

src/plugins/intel_cpu/src/graph.cpp

-                // !! Fallback to individual memory allocation !!
-                // if you like to check infer without reuse just call this function without arguments.


Just wondering if the same debug functionality is still available

Honestly it isn't, since to have such option we need to replace the memory manager used for the dynamic tensors allocation. However, such change is rather trivial.

maxnick · 2024-09-04T07:53:07Z

inference request can execute w/o issues after release_meomry was called

This test has been already introduced, please refer to 297e65c

### Details: Port #26390 to master ### Tickets: - CVS-145873

maxnick added 30 commits August 5, 2024 13:11

Rename mem manager to mem block

f211b01

Remove isAllocated check from the memory class

f723360

Fix tests build

5ced5f4

Remove incorrect checs from Input

e2440df

Remove incorrect isDefined checks

d1b544e

Redefine Split createPrimitive method

0bf53b8

Memory subsytem refactoring

e20e7f2

Merge remote-tracking branch 'origin/master' into flush_interim_tensors

3db1617

Avoid using key word

bbbcdf4

Bug fixes

692c02f

Fix linear offset calculation in static block

ece559e

Fix output edges processing

b9da47a

Add allocate and free actions

8304f77

Add flushing intermediate tensors

e58f01a

Linter fixes

334b757

Avoid calling getData to not allocated memory

b0d0964

Merge remote-tracking branch 'origin/master' into flush_interim_tensors

bfe54a0

Reallocate only defined mem

f6a8dee

Refactor the Reorder node

01bb93c

Refactor set output default ptr

1adc24c

Adapt FC executor

fd3b99a

Fix dynamic memory allocation

548e2d7

Adapt Multimodal node

2722aeb

Skip memory refresh for inPlace up

906d649

Skip string tensors in memory refresh

5c41e7c

Avoid reading uninit data in Loop initialization

f570550

Merge remote-tracking branch 'origin/master' into flush_interim_tensors

5b9d05a

Fix loop trip count reading in Loop

da1f184

Introduce memory block stub

adeeb02

WA in the Pad node to prevent reading uninit data

e4dbf00

maxnick added 3 commits August 27, 2024 17:31

Add an interface call releasing intermediate memory

8704e61

Merge remote-tracking branch 'origin/releases/2024/4' into release_in…

5cfb652

…terface_20244

Code cleanup

5f9e4d3

maxnick requested review from a team as code owners September 3, 2024 12:08

github-actions bot added category: inference OpenVINO Runtime library - Inference category: CPU OpenVINO CPU plugin category: CPP API OpenVINO CPP API bindings labels Sep 3, 2024

maxnick added this to the 2024.4 milestone Sep 3, 2024

Rename release_buffers to release_memory

f2bea35

maxnick requested review from a team as code owners September 3, 2024 17:04

github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Sep 3, 2024

maxnick added 2 commits September 3, 2024 19:18

Trivial behavior test

297e65c

Fix clang format

735b90c

ilya-lavrenov reviewed Sep 3, 2024

View reviewed changes

dmitry-gorokhov added the Code Freeze label Sep 4, 2024

dmitry-gorokhov self-assigned this Sep 4, 2024

dmitry-gorokhov approved these changes Sep 4, 2024

View reviewed changes

ilya-lavrenov approved these changes Sep 4, 2024

View reviewed changes

Modify behavior test to avoid recompilation

e961b25

moslex added the priority: high High piority label Sep 4, 2024

Fix mem size calculation for half byte types

14df043

maxnick mentioned this pull request Sep 4, 2024

[CPU] Add interface to release compiled model internal memory #26262

Merged

dmitry-gorokhov added this pull request to the merge queue Sep 5, 2024

Merged via the queue into openvinotoolkit:releases/2024/4 with commit 8c9d4be Sep 5, 2024
146 checks passed

github-merge-queue bot pushed a commit that referenced this pull request Sep 5, 2024

[CPU] Add interface to release compiled model internal memory (#26262)

ff5a463

### Details: Port #26390 to master ### Tickets: - CVS-145873

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Add interface to release compiled model internal memory #26390

[CPU] Add interface to release compiled model internal memory #26390

maxnick commented Sep 3, 2024

ilya-lavrenov left a comment

ilya-lavrenov Sep 3, 2024

maxnick Sep 4, 2024

dmitry-gorokhov left a comment

dmitry-gorokhov Sep 3, 2024

dmitry-gorokhov Sep 4, 2024

maxnick Sep 4, 2024

dmitry-gorokhov Sep 4, 2024

maxnick Sep 4, 2024

maxnick commented Sep 4, 2024

		// !! Fallback to individual memory allocation !!
		// if you like to check infer without reuse just call this function without arguments.

[CPU] Add interface to release compiled model internal memory #26390

[CPU] Add interface to release compiled model internal memory #26390

Conversation

maxnick commented Sep 3, 2024

Details:

Tickets:

ilya-lavrenov left a comment

Choose a reason for hiding this comment

ilya-lavrenov Sep 3, 2024

Choose a reason for hiding this comment

maxnick Sep 4, 2024

Choose a reason for hiding this comment

dmitry-gorokhov left a comment

Choose a reason for hiding this comment

dmitry-gorokhov Sep 3, 2024

Choose a reason for hiding this comment

dmitry-gorokhov Sep 4, 2024

Choose a reason for hiding this comment

maxnick Sep 4, 2024

Choose a reason for hiding this comment

dmitry-gorokhov Sep 4, 2024

Choose a reason for hiding this comment

maxnick Sep 4, 2024

Choose a reason for hiding this comment

maxnick commented Sep 4, 2024