Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Add interface to release compiled model internal memory #26390

Conversation

maxnick
Copy link
Contributor

@maxnick maxnick commented Sep 3, 2024

Details:

This PR introduces an ov::CompiledModel level interface that allows to release memory allocated by the compiled model. In this PR the interface is only supported by the CPU plugin.

Tickets:

@maxnick maxnick requested review from a team as code owners September 3, 2024 12:08
@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: CPU OpenVINO CPU plugin category: CPP API OpenVINO CPP API bindings labels Sep 3, 2024
@maxnick maxnick added this to the 2024.4 milestone Sep 3, 2024
@maxnick maxnick requested review from a team as code owners September 3, 2024 17:04
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Sep 3, 2024
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future releases, we need to add behavior tests for this feature:

  • release_memory throws when called during running inferences
  • inference request can execute w/o issues after release_meomry was called

for (auto&& graph : m_graphs) {
GraphGuard::Lock graph_lock{graph};
auto ctx = graph_lock._graph.getGraphContext();
ctx->getNetworkMemoryControl()->releaseMemory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we release memory only for graphs which don't have running inference requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we would have to add a delayed release for the others. I'm not sure we are really want to have nonuniform memory release across streams.

Copy link
Contributor

@dmitry-gorokhov dmitry-gorokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking comments from my side

/**
* @brief Release intermediate memory.
*
* This methods forces the Compiled model to release memory allocated for intermediate structures, e.g. caches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods -> method

@@ -77,6 +77,14 @@ bool Multinomial::needPrepareParams() const {
return true;
}

void Multinomial::createPrimitive() {
if (!m_const_inputs[NUM_SAMPLES_PORT]) {
CPU_NODE_ASSERT(isDynamicNode(), "is static while the samples input is a variable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we just move m_samples_count computation into execute()? I suppose it shouldn't introduce any overheads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. Actually I'm going to prepare a list of leftovers for this PR, which then have to be put to the backlog and planned accordingly, so I'll include this one too.

Comment on lines -837 to -838
// !! Fallback to individual memory allocation !!
// if you like to check infer without reuse just call this function without arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if the same debug functionality is still available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly it isn't, since to have such option we need to replace the memory manager used for the dynamic tensors allocation. However, such change is rather trivial.

@maxnick
Copy link
Contributor Author

maxnick commented Sep 4, 2024

  • inference request can execute w/o issues after release_meomry was called

This test has been already introduced, please refer to 297e65c

@moslex moslex added the priority: high High piority label Sep 4, 2024
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Sep 5, 2024
Merged via the queue into openvinotoolkit:releases/2024/4 with commit 8c9d4be Sep 5, 2024
146 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference Code Freeze priority: high High piority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants