Skip to content

Commit

Permalink
Initial PR for fixing issues on existing TCs for gpufunctest (openvin…
Browse files Browse the repository at this point in the history
…otoolkit#71)

* Fix typo to set begin/end mask to kernel param correctly for strided_slice

Signed-off-by: Andrew Park <[email protected]>

* Fix exception to try resue input with empty deps and _exec_deps during creating reshape_inst

Signed-off-by: Andrew Park <[email protected]>

* Fix strided_slice to support static shape

Signed-off-by: Andrew Park <[email protected]>

* Fix input layout representation for byxf and nv12 on parameter

Signed-off-by: Andrew Park <[email protected]>

* Fix broadcast size check logic

Signed-off-by: Andrew Park <[email protected]>

* Fix layout initialization to convert tensor(cldnn format) to PartialShape(IE format) for weight reorder

Signed-off-by: Andrew Park <[email protected]>

* Fix kernel data conversion to convert PartialShape to ordered dims with output fortmat

Signed-off-by: Andrew Park <[email protected]>

* Fix not to check the new layout is identical empty tensor

Signed-off-by: Andrew Park <[email protected]>

* Enable cases to reorder/reshape + gemm  where op is not FC

Signed-off-by: Andrew Park <[email protected]>

* Update AsyncInferRequest for InferRequestLegacy compatibility

Signed-off-by: Andrew Park <[email protected]>

* Apply PR#11073 update eltwise calc_output_layout_function and replace PartialShape::broadcast_merge_into with tensor::max

Signed-off-by: Andrew Park <[email protected]>

* Fix reduce calc_output_layout to apply reduce_axes correctly

Signed-off-by: Andrew Park <[email protected]>

* Fix scatter_update calc_output_layout to get the number of dims from dependencies correctly

Signed-off-by: Andrew Park <[email protected]>

* Fix gather_nd calc_output_layout to calculate the final output tensor

Signed-off-by: Andrew Park <[email protected]>

* Fix calc_body_input_layout to adjust cropped input shape correctly and update ==operator for layout comparison

Signed-off-by: Andrew Park <[email protected]>

* Fix PartialShape representation whether input rank is 2 and 3 for reshape preprocess of gemm

Signed-off-by: Andrew Park <[email protected]>

* Add condition to check whether input layout is dynamic or not in gather calc_output_layout

Signed-off-by: Andrew Park <[email protected]>

* Fix ScatterUpdate issue

* Align scatter update axis format with IE

Signed-off-by: Andrew Park <[email protected]>

* Revert "Enable cases to reorder/reshape + gemm  where op is not FC"

This reverts commit 16a60b5.

Signed-off-by: Andrew Park <[email protected]>

* Revert "Update AsyncInferRequest for InferRequestLegacy compatibility"

This reverts commit f57a7e4.

Signed-off-by: Andrew Park <[email protected]>

* Update scatter_update to propagate axis with integer type instead of scatter_update_axis

Signed-off-by: Andrew Park <[email protected]>

* Revert "Fix not to check the new layout is identical empty tensor"

This reverts commit 1215c70.

Signed-off-by: Andrew Park <[email protected]>

Co-authored-by: Ahn, Paul Y <[email protected]>

Final PR for fixing issues on existing TCs for gpufunctest (openvinotoolkit#72)

* Initial integration into InferReqeust for RemoteBlob and DynamicBatch

Signed-off-by: Andrew Park <[email protected]>

* Enable DynamicBatch related logics

Signed-off-by: Andrew Park <[email protected]>

* Fix PartialShape comparison related issues on TensorIterator/LSTMSequenceTest

Signed-off-by: Andrew Park <[email protected]>

* Fix feature representation for slope layout when shape has a dimension = 1

Signed-off-by: Andrew Park <[email protected]>

* Revert "Fix feature representation for slope layout when shape has a dimension = 1"

This reverts commit 1169fbb.

* Revert "Fix PartialShape comparison related issues on TensorIterator/LSTMSequenceTest"

This reverts commit 664175c.

Signed-off-by: Andrew Park <[email protected]>
  • Loading branch information
andrew-k-park authored and vladimir-paramuzov committed Jul 26, 2022
1 parent b9555f5 commit 58a3001
Show file tree
Hide file tree
Showing 19 changed files with 552 additions and 188 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// Copyright (C) 2018-2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include <string>
#include <map>
#include <cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp>
#include "intel_gpu/plugin/infer_request_legacy.hpp"

namespace ov {
namespace runtime {
namespace intel_gpu {

class AsyncInferRequestLegacy : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
public:
using Parent = InferenceEngine::AsyncInferRequestThreadSafeDefault;
AsyncInferRequestLegacy(const InferRequestLegacy::Ptr &inferRequest,
const InferenceEngine::ITaskExecutor::Ptr& taskExecutor,
const InferenceEngine::ITaskExecutor::Ptr& waitExecutor,
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);

~AsyncInferRequestLegacy();

void Infer_ThreadUnsafe() override;
void StartAsync_ThreadUnsafe() override;

private:
InferRequestLegacy::Ptr _inferRequest;
InferenceEngine::ITaskExecutor::Ptr _waitExecutor;
};

} // namespace intel_gpu
} // namespace runtime
} // namespace ov
23 changes: 15 additions & 8 deletions src/plugins/intel_gpu/include/intel_gpu/plugin/infer_request.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
namespace ov {
namespace intel_gpu {

struct buf_info {
size_t buf_offset;
size_t buf_size;
};

class CompiledModel;

class InferRequest : public InferenceEngine::IInferRequestInternal {
Expand All @@ -41,11 +46,8 @@ class InferRequest : public InferenceEngine::IInferRequestInternal {
void SetBlob(const std::string& name, const InferenceEngine::Blob::Ptr &data) override;
void SetBlobs(const std::string& name, const std::vector<InferenceEngine::Blob::Ptr> &data) override;

<<<<<<< HEAD
void SetBatch(int batch = -1) override;
std::vector<std::shared_ptr<InferenceEngine::IVariableStateInternal>> QueryState() override;
=======
>>>>>>> e0b515419c... [GPU] First func test works
void SetGraph(std::shared_ptr<Graph> graph);
void EnableProfiling() { m_useProfiling = true; }
void EnableStreams() { m_useStreams = true; }
Expand All @@ -59,6 +61,9 @@ class InferRequest : public InferenceEngine::IInferRequestInternal {
void enqueue();
void wait();

void preprocess_dynamic();
void enqueue_dynamic();
void wait_dynamic();

bool use_external_queue() const { return m_useExternalQueue; }
void enable_external_queue() { m_useExternalQueue = true; }
Expand All @@ -73,10 +78,11 @@ class InferRequest : public InferenceEngine::IInferRequestInternal {
bool m_useProfiling = false;
bool m_useStreams = false;
bool m_useExternalQueue = false;
bool is_allocated = false;
std::shared_ptr<Graph> m_graph;

// dynamic batch stuff
std::map<std::string, std::vector<buf_info>> batchInputs;
std::map<std::string, std::vector<buf_info>> batchOutputs;
InferenceEngine::IStreamsExecutor* streamExecutor = nullptr;

void prepare_input(const cldnn::primitive_id &inputName, InferenceEngine::Blob::Ptr &inputBlob,
Expand All @@ -87,16 +93,17 @@ class InferRequest : public InferenceEngine::IInferRequestInternal {
std::shared_ptr<InferenceEngine::IAllocator> alloc = nullptr);
InferenceEngine::Blob::Ptr create_device_blob(const InferenceEngine::TensorDesc& desc);

void copy_output_data(cldnn::memory::ptr outputMemory, InferenceEngine::Blob::Ptr bptr);
void copy_output_data(cldnn::memory::ptr outputMemory, InferenceEngine::Blob::Ptr bptr, buf_info* bi = nullptr);
void copy_input_data(std::shared_ptr<cldnn::network> network, const cldnn::primitive_id &inputName,
const cldnn::layout& inputLayout, const InferenceEngine::Blob &inputBlob);
const cldnn::layout& inputLayout, const InferenceEngine::Blob &inputBlob,
buf_info* bi = nullptr);

InferenceEngine::Blob::Ptr create_shared_device_blob(const InferenceEngine::TensorDesc& desc, const cldnn::layout& layout, void* usm_host_mem);
void allocate_inputs();
void allocate_outputs();
void allocate_inputs_dynamic();
void allocate_outputs_dynamic();

void set_input(const std::string& name, const InferenceEngine::Blob::Ptr& data);
void set_output(const std::string& name, const InferenceEngine::Blob::Ptr& data);
InferenceEngine::Blob::Ptr reinterpret_device_blob(InferenceEngine::Blob::Ptr data, const InferenceEngine::TensorDesc& new_desc);

std::map<cldnn::primitive_id, cldnn::network_output> internal_outputs;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ namespace ov {
namespace runtime {
namespace intel_gpu {

struct buf_info {
struct buf_info_legacy {
size_t buf_offset;
size_t buf_size;
};
Expand Down Expand Up @@ -81,8 +81,8 @@ class InferRequestLegacy : public InferenceEngine::IInferRequestInternal {
std::shared_ptr<Graph> m_graph;

// dynamic batch stuff
std::map<std::string, std::vector<buf_info>> batchInputs;
std::map<std::string, std::vector<buf_info>> batchOutputs;
std::map<std::string, std::vector<buf_info_legacy>> batchInputs;
std::map<std::string, std::vector<buf_info_legacy>> batchOutputs;
InferenceEngine::IStreamsExecutor* streamExecutor = nullptr;

void prepare_input(const cldnn::primitive_id &inputName, InferenceEngine::Blob::Ptr &inputBlob,
Expand All @@ -93,10 +93,10 @@ class InferRequestLegacy : public InferenceEngine::IInferRequestInternal {
std::shared_ptr<InferenceEngine::IAllocator> alloc = nullptr);
InferenceEngine::Blob::Ptr create_device_blob(const InferenceEngine::TensorDesc& desc, const cldnn::layout& layout);

void copy_output_data(cldnn::memory::ptr outputMemory, InferenceEngine::Blob::Ptr bptr, buf_info* bi = nullptr);
void copy_output_data(cldnn::memory::ptr outputMemory, InferenceEngine::Blob::Ptr bptr, buf_info_legacy* bi = nullptr);
void copy_input_data(std::shared_ptr<cldnn::network> network, const cldnn::primitive_id &inputName,
const cldnn::layout& inputLayout, const InferenceEngine::Blob &inputBlob,
buf_info* bi = nullptr);
buf_info_legacy* bi = nullptr);

InferenceEngine::Blob::Ptr create_shared_device_blob(const InferenceEngine::TensorDesc& desc, const cldnn::layout& layout, void* usm_host_mem);
void allocate_inputs();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ struct scatter_update : public primitive_base<scatter_update> {
const primitive_id& dict,
const primitive_id& idx,
const primitive_id& idupd,
const scatter_update_axis axis,
const int axis,
const primitive_id& ext_prim_id = "",
const padding& output_padding = padding())
: primitive_base(id, {dict, idx, idupd}, ext_prim_id, output_padding), axis(axis) {}

/// @brief ScatterUpdate axis
scatter_update_axis axis;
int axis;
};
/// @}
/// @}
Expand Down
44 changes: 30 additions & 14 deletions src/plugins/intel_gpu/src/graph/eltwise.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,23 +31,39 @@ layout eltwise_inst::calc_output_layout(eltwise_node const& node) {

auto output_type = node.get_primitive()->output_data_type ? *node.get_primitive()->output_data_type : input_node_layout.data_type;

ov::PartialShape out_pshape;
auto format = input_node_layout.format;
for (size_t i = 0; i < node.inputs_count(); i++) {
if (i == primary_input_idx)
continue;
auto get_output_layout = [&](){
auto format = input_node_layout.format;
if (input_node_layout.is_static()) {
auto size = input_node_layout.get_tensor();
for (size_t i = 0; i < node.inputs_count(); i++) {
if (i == primary_input_idx)
continue;

auto l = node.input(i).get_non_padded_output_layout();
if (!ov::PartialShape::broadcast_merge_into(out_pshape, l.size, ov::op::AutoBroadcastSpec(ov::op::AutoBroadcastType::NUMPY))) {
IE_THROW() << "incorrect input shapes\n";
auto l = node.input(i).get_non_padded_output_layout();
size = tensor::max(size, l.get_tensor());
if (l.format == format::b_fs_zyx_fsv16) // use optimized 5D
format = format::b_fs_zyx_fsv16;
else if (l.format == format::bs_fs_zyx_bsv16_fsv16)
format = format::bs_fs_zyx_bsv16_fsv16;
}
return layout(output_type, format, size);
} else {
ov::PartialShape out_pshape;
for (size_t i = 0; i < node.inputs_count(); i++) {
auto l = node.input(i).get_non_padded_output_layout();
if (!ov::PartialShape::broadcast_merge_into(out_pshape, l.size, ov::op::AutoBroadcastSpec(ov::op::AutoBroadcastType::NUMPY))) {
IE_THROW() << "incorrect input shapes\n";
}
if (l.format == format::b_fs_zyx_fsv16) // use optimized 5D
format = format::b_fs_zyx_fsv16;
else if (l.format == format::bs_fs_zyx_bsv16_fsv16)
format = format::bs_fs_zyx_bsv16_fsv16;
}
return layout(output_type, format, out_pshape);
}
};

if (l.format == format::b_fs_zyx_fsv16) // use optimized 5D
format = format::b_fs_zyx_fsv16;
else if (l.format == format::bs_fs_zyx_bsv16_fsv16)
format = format::bs_fs_zyx_bsv16_fsv16;
}
auto output_layout = layout(output_type, format, out_pshape);
auto output_layout = get_output_layout();

auto mode = node.get_primitive()->mode;
// list of operations supported for integer types
Expand Down
59 changes: 57 additions & 2 deletions src/plugins/intel_gpu/src/graph/impls/ocl/gemm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#include "gemm/gemm_kernel_base.h"
#include "intel_gpu/runtime/error_handler.hpp"

#include "matmul_shape_inference.hpp"

namespace cldnn {
namespace ocl {

Expand All @@ -29,8 +31,61 @@ struct gemm_impl : typed_primitive_impl_ocl<gemm> {
auto gemm_optional_params =
get_default_optional_params<kernel_selector::gemm_optional_params>(arg.get_program());

for (size_t i = 1; i < arg.inputs_count(); i++) {
gemm_params.inputs.push_back(convert_data_tensor(impl_param->input_layouts[i]));
auto gemmSpecificPartialShape = [](ov::PartialShape& pshape) {
switch (pshape.rank().get_length()) {
case 2: { // batch, feature representation (rank == 2)
pshape.insert(pshape.begin(), 1ul);
pshape.insert(pshape.begin(), 1ul);
break;
}
case 3 : { // feature representation (rank == 3)
pshape.insert(pshape.begin(), 1, 1ul);
break;
}
}
};
auto output_layout = arg.get_output_layout();
auto output_pshape = output_layout.size;
auto output_rank = output_pshape.rank().get_length();
std::vector<ov::PartialShape> input_shapes;
for (size_t i = 0; i < arg.inputs_count(); i++) {
auto input_layout = arg.input(i).get_output_layout();
auto input_pshape = input_layout.get_partial_shape();
auto input_rank = input_pshape.rank().get_length();
if (input_rank != output_rank || input_rank < 4) {
if (input_rank == 1) {
bool transpose = false;
if (i == 0) {
transpose = arg.get_primitive()->transpose_input0;
input_pshape.insert(input_pshape.begin(), 1);
} else {
transpose = arg.get_primitive()->transpose_input1;
input_pshape.insert(input_pshape.end(), 1);
}
if (transpose) {
std::swap(input_pshape[0], input_pshape[1]);
}
}
if (input_rank < output_rank)
input_pshape.insert(input_pshape.begin(), output_rank - input_rank, 1ul);

gemmSpecificPartialShape(input_pshape);
}
input_layout.size = input_pshape;
input_shapes.push_back(input_pshape);
if (i == 0)
gemm_params.inputs[0] = convert_data_tensor(input_layout);
else
gemm_params.inputs.push_back(convert_data_tensor(input_layout));
}
if (output_rank < 4) {
ov::op::v0::MatMul op;
op.set_transpose_a(arg.get_primitive()->transpose_input0);
op.set_transpose_b(arg.get_primitive()->transpose_input1);
std::vector<ov::PartialShape> output_shapes = {ov::PartialShape()};
shape_infer(&op, input_shapes, output_shapes);
output_layout.size = output_shapes[0];
gemm_params.outputs[0] = convert_data_tensor(output_layout);
}

gemm_params.alpha = desc->alpha;
Expand Down
34 changes: 18 additions & 16 deletions src/plugins/intel_gpu/src/graph/impls/ocl/scatter_update.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,24 @@ using namespace cldnn;

namespace cldnn {
namespace ocl {
kernel_selector::scatter_update_axis convert_axis(scatter_update::scatter_update_axis axis, const scatter_update_node& arg) {
switch (axis) {
case scatter_update::along_x:
return kernel_selector::scatter_update_axis::X;
case scatter_update::along_y:
return kernel_selector::scatter_update_axis::Y;
case scatter_update::along_z:
return kernel_selector::scatter_update_axis::Z;
case scatter_update::along_w:
return kernel_selector::scatter_update_axis::W;
case scatter_update::along_f:
return kernel_selector::scatter_update_axis::FEATURE;
case scatter_update::along_b:
return kernel_selector::scatter_update_axis::BATCH;
default:
CLDNN_ERROR_MESSAGE(arg.id(), "Unsupported Axis");
kernel_selector::scatter_update_axis convert_axis(const int axis, const scatter_update_node& arg) {
auto rank = arg.input(0).get_output_layout().get_rank();
auto cldnn_axis = axis;
if (axis >= 2) {
auto spatial_axis = axis - 2;
const size_t default_dims = 4; // Default and minimum number of dimensions is 4
auto spatial_size = std::max(rank, default_dims) - 2;
cldnn_axis = spatial_size - spatial_axis - 1 + 2;
}

switch (cldnn_axis) {
case 0: return kernel_selector::scatter_update_axis::BATCH;
case 1: return kernel_selector::scatter_update_axis::FEATURE;
case 2: return kernel_selector::scatter_update_axis::X;
case 3: return kernel_selector::scatter_update_axis::Y;
case 4: return kernel_selector::scatter_update_axis::Z;
case 5: return kernel_selector::scatter_update_axis::W;
default: CLDNN_ERROR_MESSAGE(arg.id(), "Unsupported Axis");
}
return kernel_selector::scatter_update_axis::X;
}
Expand Down
57 changes: 40 additions & 17 deletions src/plugins/intel_gpu/src/graph/impls/ocl/strided_slice.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,24 +32,47 @@ struct strided_slice_impl : typed_primitive_impl_ocl<strided_slice> {
auto op_params = get_default_optional_params<kernel_selector::strided_slice_optional_params>(arg.get_program());
const size_t dims_num = params.inputs[0].Dimentions();

// Getting data from constant inputs. There are 3 args: Begin, End, Stride
for (size_t i = 0; i < arg.const_mem.size(); ++i) {
auto mem = arg.const_mem[i];
std::vector<int32_t> sizes;
if (mem->get_layout().data_type == cldnn::data_types::i64) {
mem_lock<int64_t, mem_lock_type::read> lock{mem, arg.get_program().get_stream()};
int64_t* data = lock.data();
std::vector<int64_t> sizes_i64 = std::vector<int64_t>(data, data + mem->get_layout().count());
sizes.resize(sizes_i64.size());
for (size_t j = 0; j < sizes.size(); j++)
sizes[j] = static_cast<int32_t>(sizes_i64[j]);
} else {
mem_lock<int32_t, mem_lock_type::read> lock{mem, arg.get_program().get_stream()};
int32_t* data = lock.data();
sizes = std::vector<int32_t>(data, data + mem->get_layout().count());
if (!arg.const_mem.empty()) {
// Getting data from constant inputs. There are 3 args: Begin, End, Stride
for (size_t i = 0; i < arg.const_mem.size(); ++i) {
auto mem = arg.const_mem[i];
std::vector<int32_t> sizes;
if (mem->get_layout().data_type == cldnn::data_types::i64) {
mem_lock<int64_t, mem_lock_type::read> lock{mem, arg.get_program().get_stream()};
int64_t* data = lock.data();
std::vector<int64_t> sizes_i64 = std::vector<int64_t>(data, data + mem->get_layout().count());
sizes.resize(sizes_i64.size());
for (size_t j = 0; j < sizes.size(); j++)
sizes[j] = static_cast<int32_t>(sizes_i64[j]);
} else {
mem_lock<int32_t, mem_lock_type::read> lock{mem, arg.get_program().get_stream()};
int32_t* data = lock.data();
sizes = std::vector<int32_t>(data, data + mem->get_layout().count());
}
pad_vector_to_size(sizes, dims_num, i != 1); // for "begin" completion used 0 value, for other - 1
params.striding_params.push_back(sizes);
}
} else {
// Getting data from constant inputs. There are 3 args: Begin, End, Stride
for (size_t i = 1; i < arg.get_dependencies().size(); ++i) {
auto& input = arg.get_dependency(i).as<data>();
auto mem = input.get_attached_memory_ptr();
std::vector<int32_t> sizes;
if (input.get_output_layout().data_type == cldnn::data_types::i64) {
mem_lock<int64_t> lock{mem, arg.get_program().get_stream()};
int64_t* data = lock.data();
std::vector<int64_t> sizes_i64 = std::vector<int64_t>(data, data + input.get_output_layout().count());
sizes.resize(sizes_i64.size());
for (size_t j = 0; j < sizes.size(); j++)
sizes[j] = static_cast<int32_t>(sizes_i64[j]);
} else {
mem_lock<int32_t> lock{mem, arg.get_program().get_stream()};
int32_t* data = lock.data();
sizes = std::vector<int32_t>(data, data + input.get_output_layout().count());
}
pad_vector_to_size(sizes, dims_num, i != 1); // for "begin" completion used 0 value, for other - 1
params.striding_params.push_back(sizes);
}
pad_vector_to_size(sizes, dims_num, i != 1); // for "begin" completion used 0 value, for other - 1
params.striding_params.push_back(sizes);
}

auto begin_mask_ = prim->begin_mask;
Expand Down
2 changes: 1 addition & 1 deletion src/plugins/intel_gpu/src/graph/kernel_selector_helper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,7 @@ kernel_selector::dev_type get_device_type(cldnn::device_type type) {

kernel_selector::data_tensor convert_data_tensor(const layout& l, uint32_t split, const tensor view_offset) {
const auto& pad = l.data_padding;
const auto& vals = l.get_dims();
const auto& vals = l.get_tensor().sizes(l.format);
const auto& add_offsets = view_offset.sizes(l.format);
const auto& lower_pad = pad.lower_size().sizes(l.format);
const auto& upper_pad = pad.upper_size().sizes(l.format);
Expand Down
Loading

0 comments on commit 58a3001

Please sign in to comment.