[GPU] Graph serialization for GPU #13801

e-ddykim · 2022-11-02T17:16:54Z

Details:

This PR adds a model caching for GPU plugin as a preview feature.
- It reduces the first inference latency by skipping the graph optimization passes.
- The main components to be serialized are primitive_inst and primitive_impl.
- primitive, program and program_node are not serialized.
To enable it, it is required to set an environmental variable 'OV_GPU_CACHE_MODEL' to 1.
- If not set, current kernel caching feature is activated.

Tickets:

57672

src/plugins/intel_gpu/src/plugin/graph.cpp

src/plugins/intel_gpu/src/plugin/plugin.cpp

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

src/plugins/intel_gpu/src/graph/program.cpp

src/plugins/intel_gpu/src/plugin/compiled_model.cpp

yeonbok · 2022-11-08T05:59:43Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

+    , _impl(nullptr)
+    , _outputs({memory::ptr()})
+    , _output_changed(false)
+    , _mem_allocated(false) {}


Can this flag be recognized by can_be_optimized?

In most cases, can_be_optimized() is true when _mem_allocated is false. But in the case of "implicit concat" for onednn, can_be_optimized() is false while _mem_allocated is false.

src/plugins/intel_gpu/src/graph/get_type_id.cpp

vladimir-paramuzov

@e-ddykim @yeonbok Guys, you merged this PR too quickly, I haven't finished my review yet.
I submit comments that I have at the moment, so please address them in separate PR.

src/plugins/intel_gpu/src/runtime/CMakeLists.txt

src/plugins/intel_gpu/include/intel_gpu/graph/network.hpp

src/plugins/intel_gpu/include/intel_gpu/plugin/compiled_model.hpp

src/plugins/intel_gpu/src/graph/include/serialization/layout_serializer.hpp

src/plugins/intel_gpu/src/graph/include/serialization/binary_buffer.hpp

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

vladimir-paramuzov · 2022-11-09T14:20:04Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

@@ -901,4 +941,218 @@ std::string primitive_inst::get_implementation_name() const {
    return "undef";
 }

+void primitive_inst::save(cldnn::BinaryOutputBuffer& ob) const {
+    if (type() == cldnn::data::type_id() ||
+       (type() == cldnn::mutable_data::type_id() && _impl == nullptr)) {


Why not override save method for data/mutable data instead of having branch in common impl?

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1027065103
I overrided save and load methods for data/mutable data, and removed a branch. Thank you.

vladimir-paramuzov · 2022-11-09T14:24:57Z

src/plugins/intel_gpu/src/graph/mutable_data.cpp

+    if (!_mem_allocated) {
+        for (size_t dep_idx = 0; dep_idx < _deps.size(); ++dep_idx) {
+            for (size_t m_idx = 0; m_idx < _deps[dep_idx]->_deps.size(); ++m_idx) {
+                if (get_network().get_engine().is_the_same_buffer(*_outputs[0], *_deps[dep_idx]->_deps[m_idx]->_outputs[0])) {


That logic looks weird and unsafe. If I understand correctly, it assumes that mutable_data is used only asWA for multiple outputs

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1027035950
I removed this logic as you reviewed. Thank you.

src/plugins/intel_gpu/src/graph/include/serialization/object_types.hpp

src/plugins/intel_gpu/src/graph/get_type_id.cpp

vladimir-paramuzov · 2022-11-14T06:11:04Z

src/tests/functional/plugin/shared/src/behavior/ov_plugin/caching_tests.cpp

@@ -181,6 +181,11 @@ void CompileModelCacheTestBase::run() {
        }


Such big patch with 0 unit tests is unacceptable.

I did not add unit tests that does not use the IE interfaces (export and import) because serialization needs to be saved to a file and then loaded again to see if it works. And the number of newly enabled functional tests related to serialization is more than 200.

I checked those functional tests and can't say that we can be sure that everything works fine based on them only. E.g. if I change import of CompiledModel::Export to return in the very beginning, then the tests still pass.
Basically, the test check that 1. properties are supported 2. blob file is created; and there is no guarantee that blob file is used or contains expected content

because serialization needs to be saved to a file

Why? As I can see objects are working with ostream/istream, so you can probably you some stream type which is operating in memory

Caching functional tests runs the same case three times.

run without cache at L200

run to create cache at L219 with i =0

run with cache at L219 with i =1

Then, it compares the results 1 vs. 2 and 1 vs. 3 at L223

openvino/src/tests/functional/plugin/shared/src/behavior/ov_plugin/caching_tests.cpp

Line 223 in c51ee52

compare(originalOutputs, get_plugin_outputs());

But, I agree with you in that we can't say that we can be sure that everything works fine based on them only. I'll add unit tests working on memory as you guided. Thank you.

I added 38 unit tests for serialization. These tests have export_import in the test case name. Thank you.

src/plugins/intel_gpu/src/runtime/kernels_cache.hpp

vladimir-paramuzov · 2022-11-14T06:23:25Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

+    }
+
+    if (idx == _deps.size())
+        std::cout << "[get_index_in_deps]: not found" << std::endl;


Why? It should be either removed or changed to exception.

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1024716869
Changed to throw an exception. Thank you.

vladimir-paramuzov · 2022-11-14T06:37:23Z

src/plugins/intel_gpu/src/graph/program.cpp

@@ -762,7 +762,9 @@ void program::cleanup() {
            }
        }
    }
-    _kernels_cache->reset();
+
+    if (_engine.configuration().kernels_cache_path.empty())


https://github.com/openvinotoolkit/openvino/pull/13986/files#r1021640422
Reverted. Thank you.

vladimir-paramuzov · 2022-11-14T06:45:55Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

@@ -425,6 +426,9 @@ void primitive_inst::set_arguments() {
 }

 void primitive_inst::build_deps() {
+    if (_node == nullptr)
+        return;


Shouldn't exception be thrown here?

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1024718320
Updated to throw an exception when _node is null. Thank you.

vladimir-paramuzov · 2022-11-14T06:47:21Z

src/plugins/intel_gpu/src/graph/network.cpp

+    int num_data_nodes;
+    ib >> num_data_nodes;
+
+    _memory_pool->clear_pool_for_network(net_id);


Why is it needed? new mem pool object is created above, so I think clear is redundant.

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1024741591
I removed it. Thank you.

vladimir-paramuzov · 2022-11-14T06:48:48Z

src/plugins/intel_gpu/src/graph/network.cpp

+    , _internal(false)
+    , _is_primary_stream(false)
+    , _reset_arguments(true) {
+    net_id += 1;


I believe net_id is always 1 here which is unexpected.

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1025253714
I added a new function get_new_net_id() to emit an unique id, and applied it to network ctors. Thank you.

vladimir-paramuzov · 2022-11-14T06:49:51Z

src/plugins/intel_gpu/src/graph/network.cpp

+        std::string type;
+        std::string _primitive_id;
+        ib >> type >> _primitive_id;
+        std::shared_ptr<cldnn::primitive_inst> new_primitive_inst = cldnn::get_type_id(type)->create_instance(*this);


Why do we need to separate data nodes and other node types here?

In the case of deserialization, output memory is allocated whenever each primitive_inst is restored. At this time, in the case of primitive_inst that does not allocate memory on its own (_mem_allocated is false), the memory address of another primitive_inst is used. In the case of non-data types, there is no problem if they are restored in the order of _exec_order. But, since data types are not in _exec_order, memory addresses may not be known unless allocated in advance.

e-ddykim · 2022-11-14T07:50:50Z

@e-ddykim @yeonbok Guys, you merged this PR too quickly, I haven't finished my review yet. I submit comments that I have at the moment, so please address them in separate PR.

@vladimir-paramuzov Thank you for taking your valuable time to review. I will submit a new PR to address your review.

vladimir-paramuzov · 2022-11-14T09:58:51Z

@e-ddykim one more issue in cmake output:

Checking patch include/oneapi/dnnl/dnnl.hpp...
error: while searching for:
    struct desc {
        dnnl_convolution_desc_t data;

        /// Constructs a descriptor for a convolution forward propagation
        /// primitive with bias.
        ///

error: patch failed: include/oneapi/dnnl/dnnl.hpp:4686
error: include/oneapi/dnnl/dnnl.hpp: patch does not apply

e-ddykim · 2022-11-14T15:16:05Z

@e-ddykim one more issue in cmake output:

Checking patch include/oneapi/dnnl/dnnl.hpp...
error: while searching for:
    struct desc {
        dnnl_convolution_desc_t data;

        /// Constructs a descriptor for a convolution forward propagation
        /// primitive with bias.
        ///

error: patch failed: include/oneapi/dnnl/dnnl.hpp:4686
error: include/oneapi/dnnl/dnnl.hpp: patch does not apply

https://github.com/openvinotoolkit/openvino/pull/13986/files#r1021667835
That error message occurs when trying to patch code that has already been patched again. I updated not to be displayed to prevent confusion.

e-ddykim force-pushed the gpu-serial_poc branch 8 times, most recently from 37daf88 to e72ffa4 Compare November 7, 2022 11:51

e-ddykim added this to the 2022.3 milestone Nov 7, 2022

e-ddykim added the category: GPU OpenVINO GPU plugin label Nov 7, 2022

e-ddykim force-pushed the gpu-serial_poc branch from 2f78667 to 70c3792 Compare November 7, 2022 12:57

e-ddykim marked this pull request as ready for review November 7, 2022 14:43

e-ddykim requested review from a team as code owners November 7, 2022 14:43

e-ddykim force-pushed the gpu-serial_poc branch from 70c3792 to 1cef5bc Compare November 8, 2022 02:21

ahnyoung-paul approved these changes Nov 8, 2022

View reviewed changes