Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Add interface to release compiled model internal memory #26262

Merged
merged 42 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
f211b01
Rename mem manager to mem block
maxnick Aug 5, 2024
f723360
Remove isAllocated check from the memory class
maxnick Aug 6, 2024
5ced5f4
Fix tests build
maxnick Aug 7, 2024
e2440df
Remove incorrect checs from Input
maxnick Aug 7, 2024
d1b544e
Remove incorrect isDefined checks
maxnick Aug 7, 2024
0bf53b8
Redefine Split createPrimitive method
maxnick Aug 7, 2024
e20e7f2
Memory subsytem refactoring
maxnick Aug 12, 2024
3db1617
Merge remote-tracking branch 'origin/master' into flush_interim_tensors
maxnick Aug 12, 2024
bbbcdf4
Avoid using key word
maxnick Aug 13, 2024
692c02f
Bug fixes
maxnick Aug 13, 2024
ece559e
Fix linear offset calculation in static block
maxnick Aug 14, 2024
b9da47a
Fix output edges processing
maxnick Aug 14, 2024
8304f77
Add allocate and free actions
maxnick Aug 14, 2024
e58f01a
Add flushing intermediate tensors
maxnick Aug 14, 2024
334b757
Linter fixes
maxnick Aug 16, 2024
b0d0964
Avoid calling getData to not allocated memory
maxnick Aug 16, 2024
bfe54a0
Merge remote-tracking branch 'origin/master' into flush_interim_tensors
maxnick Aug 16, 2024
f6a8dee
Reallocate only defined mem
maxnick Aug 16, 2024
01bb93c
Refactor the Reorder node
maxnick Aug 16, 2024
1adc24c
Refactor set output default ptr
maxnick Aug 16, 2024
fd3b99a
Adapt FC executor
maxnick Aug 19, 2024
548e2d7
Fix dynamic memory allocation
maxnick Aug 19, 2024
2722aeb
Adapt Multimodal node
maxnick Aug 20, 2024
906d649
Skip memory refresh for inPlace up
maxnick Aug 20, 2024
5c41e7c
Skip string tensors in memory refresh
maxnick Aug 20, 2024
f570550
Avoid reading uninit data in Loop initialization
maxnick Aug 20, 2024
5b9d05a
Merge remote-tracking branch 'origin/master' into flush_interim_tensors
maxnick Aug 20, 2024
da1f184
Fix loop trip count reading in Loop
maxnick Aug 21, 2024
adeeb02
Introduce memory block stub
maxnick Aug 21, 2024
e4dbf00
WA in the Pad node to prevent reading uninit data
maxnick Aug 22, 2024
8ad88fb
Introduce network level memory control unit
maxnick Aug 22, 2024
4f38db6
Fix Loop for dynamic shape applications
maxnick Aug 26, 2024
8704e61
Add an interface call releasing intermediate memory
maxnick Aug 22, 2024
18795a1
Merge commit '98188ad2efa74d3b73ec6d2e5bd6ac80c4fdb570' into release_…
maxnick Sep 2, 2024
ba03485
Code cleanup
maxnick Sep 2, 2024
8fa7847
Merge remote-tracking branch 'origin/master' into release_interface
maxnick Sep 2, 2024
7cbffb4
Rename release_buffers to release_memory
maxnick Sep 3, 2024
5177db5
Trivial behavior test
maxnick Sep 3, 2024
af6e6eb
Fix clang format
maxnick Sep 3, 2024
8f71310
Modify behavior test to avoid recompilation
maxnick Sep 4, 2024
18485db
Fix mem size calculation for half byte types
maxnick Sep 4, 2024
92b7244
Merge remote-tracking branch 'origin/master' into release_interface
maxnick Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/inference/dev_api/openvino/runtime/icompiled_model.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,12 @@ class OPENVINO_RUNTIME_API ICompiledModel : public std::enable_shared_from_this<
*/
ov::SoPtr<ov::IRemoteContext> get_context() const;

/**
* @brief Release intermediate memory
*
*/
virtual void release_buffers();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about a better name. This one doens't really clarify which buffers are really implied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if several requests are still running and I will call this method?

Copy link
Contributor Author

@maxnick maxnick Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a really good question. This is merely a POC to evaluate the memory footprint in a specific application. So the exact interface level solution is a subject of an arch review.


virtual ~ICompiledModel() = default;

private:
Expand Down
9 changes: 9 additions & 0 deletions src/inference/include/openvino/runtime/compiled_model.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,15 @@ class OPENVINO_RUNTIME_API CompiledModel {
return get_property(property.name()).template as<T>();
}

/**
* @brief Release intermediate memory.
*
* This methods forces the Compiled model to release memory allocated for intermediate structures, e.g. caches,
* tensors, temporal buffers etc.
*
*/
void release_buffers();

/**
* @brief Returns pointer to device-specific shared context
* on a remote accelerator device that was used to create this CompiledModel.
Expand Down
4 changes: 4 additions & 0 deletions src/inference/src/cpp/compiled_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,10 @@ Any CompiledModel::get_property(const std::string& name) const {
});
}

void CompiledModel::release_buffers() {
OV_COMPILED_MODEL_CALL_STATEMENT(_impl->release_buffers());
}

RemoteContext CompiledModel::get_context() const {
OV_COMPILED_MODEL_CALL_STATEMENT({
auto ctx = _impl->get_context();
Expand Down
4 changes: 4 additions & 0 deletions src/inference/src/dev/icompiled_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,7 @@ ov::SoPtr<ov::IRemoteContext> ov::ICompiledModel::get_context() const {
void ov::ICompiledModel::set_model_shared_object(ov::Model& model, const std::shared_ptr<void>& shared_object) {
model.m_shared_object = shared_object;
}

void ov::ICompiledModel::release_buffers() {
OPENVINO_THROW("ov::ICompiledModel::release_buffers() is not implemented");
}
8 changes: 8 additions & 0 deletions src/plugins/intel_cpu/src/compiled_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -342,5 +342,13 @@ void CompiledModel::export_model(std::ostream& modelStream) const {
serializer << m_model;
}

void CompiledModel::release_buffers() {
for (auto&& graph : m_graphs) {
GraphGuard::Lock graph_lock{graph};
auto ctx = graph_lock._graph.getGraphContext();
ctx->getNetworkMemoryControl()->releaseMemory();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add releasing the oneDNN caches here?

}

} // namespace intel_cpu
} // namespace ov
2 changes: 2 additions & 0 deletions src/plugins/intel_cpu/src/compiled_model.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ class CompiledModel : public ov::ICompiledModel {
"Set property to Core::compile_model during compilation");
};

void release_buffers() override;

private:
std::shared_ptr<ov::ISyncInferRequest> create_sync_infer_request() const override;
friend class SyncInferRequest;
Expand Down
1 change: 1 addition & 0 deletions src/plugins/intel_cpu/src/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ struct Config {

bool collectPerfCounters = false;
bool exclusiveAsyncRequests = false;
bool flushIntermediateTensors = true; //TODO: change to false by default
maxnick marked this conversation as resolved.
Show resolved Hide resolved
SnippetsMode snippetsMode = SnippetsMode::Enable;
std::string dumpToDot = {};
std::string device_id = {};
Expand Down
Loading
Loading