Skip to content

Commit

Permalink
Es/lpt/lpt to ngraph fixes2 with master (openvinotoolkit#2671)
Browse files Browse the repository at this point in the history
* [LPT] Replace creation of dequantization with factory

* [ngraph][LPT] Add ScaleShift replace for dequantization operations

* [LPT] SubtractMultiplyToMultiplyAdd refactoring

* [LPT] Code style fix

* [LPT] Edit SubtractMultiplyToMultiplyAdd transformation for dequantization

* [LPT] Linux compilation quick fix

* [LPT] [WIP] runtime info applying

* [LPT] Concat transformation functional tests extending

* [LPT] MultiplyToConvolution + Subtract to add fusing + improvements in LowPrecisionTransformer

* [LPT] linux compilation error fix

* [LPT] compilation error

* [LPT] MultiplyToGroupConvolution fix: 5D support

* [LPT] Multiply transformation extending: FQ weights support - wip

* [LPT] FQ folding & precision selection

* [LPT] code style fixes

* [LPT] code style fixes

* [LPT] Linux compilation error fix

* [LPT] SubtractMultiplyToMultiplyAdd: refactoring

* [LPT] Tests fixes

* [LPT] MultiplyToGroupConvolution tests

* [LPT] Convert subtract with int inputs to Eltwise sub

* [LPT] Constant folding fix for quant models

* [LPT] 1) Asymmetric quantization improvement 2) tests extending

* [LPT] 2 fixes for se_resnext_50

* [LPT] Add transformation priority branch selection test

* [LPT] AddMultiplyFusion: legacy transformation quick fix

* [LPT] nGraph tests temporary disabling

* [LPT] Fix for eltwise inputs with multiple outputs

* [LPT] Fix for FQ fuse

* [LPT] Reshape by channel, batch temporary disabled

* [nGraph][LPT] MatMul fix for reading FP16 models

* [LPT] 1) Add (not after Convolution/GroupConvolution/MatMul with Constant) to Subtract 2) precision selection fix: MultiplyToGroupConvolution quick fix

* [LPT] DenseNet improvments: AddTransformation: Add to Subtract + tests

* [LPT] AddTransformarion refactoring

* [LPT] AddTransformation tests temporay disabled

* [LPT] ReshapeTransformation improvements: degradation fix

* [LPT] code style fix

* [LPT] Concat tests temporary disabling

* [LPT] tests unification
1) plugin tests: added test-cases and nGraph-validation for clamp, split and variadic split
2) func tests: added test-cases
3) transformNGraph: added the ability to run additional transformations

* [LPT] split & variadic split merge fix

* [LPT] Clamp: added support for asymmetric quantization

* [LPT] added DequantizationAttr run-time attribute

* [LPT] debug info removal

* [LPT] ConcatTransformation: zero point fix

* [LPT] CNNNetwork ReLU transformation quick fix

* [LPT]
1) Concat fix
2) ConcatMultiChannels fix
3) Added "Concat with Split" test-cases
4) Subgraph fix

* [LPT]
1) Concat fix
2) Added "Concat with different precision on childs" test-case

* [LPT] concat fix Ubuntu18

* [LPT] Concat test fixes

* [LPT] Not fp32 FQ input support

* [LPT] MatMul Fix + separateInStandaloneBranch Fix

* [LPT] Fix reference input types in mish fusion tests

* [LPT] Fix cpuFuncTests on CentOS building

* [nGraph][LPT] ScaleShift 2d, 3d nGraph conversion enabling

* [LPT] 1) FullyConnected workaround removing 2) validate_nodes_and_infer_types for LPT

* [ngraph] Add check for childs for ConvertSubtract

* [LPT] Squeeze/Unsqueeze tests unification

* [LPT] Squeeze/Unsqueeze change signature for getReference/getOriginal

* [LPT] Mul & Add -> ScaleShift quick fix

* [LPT] nGraph tests emporary disabling

* [LPT] code style fix

* [LPT] code style fix #2

* [LPT] nGraph tests temporary disabling

* [LPT] code styl fix #3

* [LPT] shared plugin tests temporary disabling

* [LPT] cleanup

* [LPT] nGraph unit_tests tests temproary disabling

* [LPT] nGraph unit tests disabling #2

* [LPT] nGraph tests disabling

* [LPT] nGraph tests temporary disabling

* [LPT] WA removing

* [LPT] CentOS compilation fix

* [LPT] KMB wa to avoid compilation error

* [LPT] functional test temporary disabling

* [nGraph] code style fixes

* [LPT] ConcatTransformation: data movement operation as intermediate handling

* [LPT] FuseSubtractToFakeQuantize after VariadicSplit

* [LPT] ConcatWithSplitTransformation functional test temporary disabling

* [LPT] Clamp and ConcatWithDifferentPrecisionsOnChilds: tests fix

* [LPT] MatMul: bert-nv-mlperf-quantized fix

* [LPT] Add to convolution biases fuse fix

* [LPT] GPU plugin tests fixes

* [LPT] Normalize GPU plugin tests fix

* [LPT] test-commit

* [LPT] CLDNN Plugin FP16 conversion

* [LPT] AvgPool update precision if there is not FQ after + convolution
precision limitation on activation

* [LPT] Convolution fixes

* [LPT] FuseSubtractToFakequantize & FuseMultiplyToFakeQuantize improvement

* [LPT] FuseSubtractToFakeQuantize test fix

* [LPT] FuseSubtractToFakeQuantizeTransformation tests

* [LPT] code style fix

* [LPT] AvgPool child recursive extend

* [LPT] AvgPool tests + fix

* [LPT] compilation quick fix

* [LPT] Add to convolution biases fuse fix

* [LPT] Linux issues: MatMulWithOptimizedConstantFakeQuantizeTransformation temporary disabled

* [LPT] Normalize GPU plugin tests fix

* [LPT] test-commit

* [LPT]
1) added the ability to create sub without dequantizationAttribute
2) fixed optimizeMulAfter: added copying rt_info
3) Tests Unification: Convolution transformation
4) added cleanRunTimeInfo into Network Helper

* [LPT] Tests Unification: GroupConvolution

* [LPT] removed debug info

* [LPT] functional tests for Convolution & GroupConvolution extending

* [LPT] [MatMul] Quick fix ubuntu error

* [LPT] MatMulTransformation quick test fix: one constant for both intervals

* [nGraph] code style fix

* [LPT] added output_precision to NormalizeIE

* [nGraph] NormalizeIE fix for LPT support

* [LPT] nGraph WA removal

* [LPT] fixed fillSubgraph for concat multi channels

* [LPT] MatMul fix

* [nGraph] WA removal: 1) nGraph tests enabling 2) LPT extanding: not handle in FP32

* [LPT] nGraph WA removal: function tests skip config rollback

* [LPT] WA removal: precision propagation fix

* [LPT] ConvertMulOrAddFinally transformation extending

* [nGraph] ConvolutionMultiplyFusion rollback (move from legacy to common)

* [nGraph] ConvertMulAddToScaleShiftOrPower: WA removal

* [nGraph] TypeRelaxed: WA removal

* [nGraph] WA removal: TypeRelaxed

* [LPT] WA removal: ConcatTransformation

* [nGraph] WA removal: Eltwise & ConvertMulOrAddFinally fixes to support LPT

* [nGraph] MulAddConversion fix: 2D & 3D ScaleShift are supproted

* [nGraph] VisualizeTree extending

* [LPT] FakeQuantizeDequantization extending: check element wise dequantization operation

* [LPT] FakeQuantizeDequantization extending: SubtractMultiplyToMultiplyAddTransformation & WeightableLayerTransformation

* [LPT] Convolution + test infrastructure update

* [LPT] GPU compilation error

* [nGraph] BatchNorm plugin tests: input tensor definition

* [LPT] LowPrecisionTransformer::isFunctionQuantized was added

* [nGraph] WA final cleanup

* [nGraph] ScaleShiftIE quick fix

* [LPT] Functional tests: added test-cases "Concat with intermediate with constant"

* [LPT] Transformer::isNetworkquantized fix

* [LPT] SubtractMultiplyToMultiplyAdd zero Add remove: fix for ssd300 on gpu

* [LPT] MultiplyToGroupConvolution not transform on Const

* [LPT] workaround for negative scales

* [LPT] Convert standalone dequantization Mul,Sub,Add to ScaleShift

* [LPT] SubtractMultiplyToMultiplyAdd test fix

* [LPT] Clamp transformation: GPU tests fix

* [LPT] Transformer tests

* [LPT] FakeQuantizePrecisionSelectionTransformation was disabled for GPU

* [LPT] TransformerIsFunctionQuantized refactoring

* [nGraph] code style fix

* [LPT] mobilenet_v2_tf_depthwise test update

* [LPT] TMP: dequantization folding

* [LPT] Elementwise transformation fix: dequantization operations constant folding

* [LPT] cleanup

* [LPT] denormal values fix

* [LPT] FuseFakeQuantize test fixed + negative multiply case

* [LPT] FP32 -> FP16 conversion info

* [LPT] FQ dot interval support + swapMultiplyAdd safely division

* [LPT] test fix

* [LPT] Tests for dot interval on FQ + tests for addTransformation enabling

* [LPT] Clamp transformation fix

* [LPT] FQ prec selection test fix

* [LPT] Clamp test case

* [LPT] Concat division precision fix

* [LPT] cleanup

* [LPT] merge fix

* [LPT] WIP: MatMul asymmetric quantization fix (BERT)

* [LPT] MatMulWithOptimizedConstantFakeQuantizeTransformation disabled

* [LPT] GPU Plugin set config fix

* [LPT] Fix merge mistakes

* [LPT] Rollback device specific INT8

* [LPT] ReshapeFullyConnected fix: FullyConnected output fix

* [LPT] bert-base-chinese GPU fix

* [ngraph/LPT] Tests for fix convert_mul_or_add_finally with dequantization

[ngraph/LPT] Fix convert mul_or_add_finally with dequantization

* [LPT] ScaleShift dim < 4 only dequantization conversion

* [LPT] MatMul transformation tests extensing

* [LPT] ReshapeFullyConnected legacy transformation: LPT test case addition

* [nGraph] VisualizeTree extending: property names displying to simplify search

* [LPT] getDequantization extending

* [LPT] MulAddToScaleshiftOrPower: out precision fix & tests

* [LPT] Multiply to ScaleShiftIE: Multiply transformation: remove DEQUANTIZATION if not valid

* [LPT] Concat test case

* [nGraph] try to fix opencv compatibility

* [nGraph] nGraph code style fix

* [LPT] InPlace dequantization folding

* [LPT] Multiply constant folding test

* [LPT] Fix plugin test case for MatMulWithOptimizedConstantFakeQuantize

[LPT] Enable MatMulWithOptimizedConstantFakeQuantize plugin test

* [LPT] Convolution transformation: mulConst shape fix

* [LPT] INT8 Constant folding branch for elementwise ops optimization removal

* [LPT] eltwise for const branch fix

* [LPT] linux fix

* [LPT] Multiply test refactoring

* [LPT] Convert Fuse in Constant + tests

* [LPT] function comparation: runtime info comparation rollback

* [LPT] linux build fix

* [LPT] linux build fix2

* [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT

* [LPT] Reshape transformation update: don't broadcast by batch

* [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT - refactoring

* [LPT] MatMul transformation: transpose input tensors fix

* [LPT] checkElementwise for AddTransformation WA: should be moved to getDequantization

* [LPT] merge fix

* [LPT] MatMul fix & tests

* [LPT] AddTransformation tests

* [LPT] Interpolate transformation enabled

* [LPT] constant folding before LPT

* [LPT] WIP: not completed tests

* [LPT] GPU degradation fix

* [LPT] FuseConvert workaround

* [LPT] code cleanup

* [LPT] Interpolate GPU test quick fix

* [LPT] GroupConvolution fix

* [LPT] Fix fusing multiply for non-dequantization layers

* [LPT] GPU pipeline update: enableInt8 initialization place update

* [LPT] tests compilation fix

* [LPT] merge fix

* [LPT] tests enabling

* [LPT] merge issue resolving

* [LPT] LPT CNNNetwork usage macros: part #1: source code

* [LPT] LPT CNNNetwork usage macros: part #2: cmake files update and tests addoption

* [LPT] LPT workaround from nGraph core removing

* [LPT] previous LPT version tests

* [LPT] inference_engine_lp_transformations was returned back

* [LPT] replace_node rollback

* [LPT] ConvertSubtract fix

* [LPT] GPU: baselineIsFP16 reuse fix

* [LPT] FakeQuantizeTransformation: GPU workaround: I32 -> FP32 Convert is not fused

* [LPT] AvgPool output precision workaround

* [LPT] Group convolution precision + Subtract to ScaleShift const fix

* [LPT] SubMulToMulAdd & Transpose: action-recognition-0001 fix

* [LPT] Transpose: added test with per-tensor quantization

Co-authored-by: Aleksandr Pertovsky <[email protected]>
Co-authored-by: Zinoviev, Vladimir <[email protected]>
Co-authored-by: Vladislav Golubev <[email protected]>
Co-authored-by: Gorokhov Dmitriy <[email protected]>
  • Loading branch information
5 people authored Oct 23, 2020
1 parent ca95240 commit c2271da
Show file tree
Hide file tree
Showing 537 changed files with 37,328 additions and 2,422 deletions.
6 changes: 5 additions & 1 deletion inference-engine/src/cldnn_engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
SOURCES ${MAIN_SRC} ${LIBRARY_HEADERS}
VERSION_DEFINES_FOR cldnn_engine.cpp)

target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
clDNN_lib pugixml inference_engine_transformations)

if (USE_CNNNETWORK_LPT)
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
endif()

set (CLDNN_TOP_FOLDER ${IE_MAIN_SOURCE_DIR}/thirdparty/clDNN)
target_include_directories(${TARGET_NAME} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}
Expand Down
108 changes: 86 additions & 22 deletions inference-engine/src/cldnn_engine/cldnn_engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@
#include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
#include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
#include <transformations/init_node_info.hpp>
#include <transformations/convert_precision.hpp>
#include <transformations/rt_info/fused_names_attribute.hpp>

#include <legacy/convert_function_to_cnn_network.hpp>
#include <legacy/ie_util_internal.hpp>
#include <legacy/graph_transformer.h>
Expand All @@ -43,6 +45,9 @@
#include "cldnn_executable_network.h"
#include "cldnn_custom_layer.h"

#include <transformations/low_precision/transformer.hpp>
#include <transformations/low_precision/mat_mul.hpp>

#ifdef __linux__
#include <dlfcn.h>
#endif
Expand Down Expand Up @@ -73,8 +78,10 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
return device_info;
}

InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const {
InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network, CLDNNPlugin::Config config) const {
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
bool baselineIsFP16 = false;

if (clonedNetwork->getFunction()) {
const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
// Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
Expand Down Expand Up @@ -113,6 +120,12 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
return can_use_reduce;
}

if (auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(node)) {
return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
}

return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node) ||
std::dynamic_pointer_cast<const ::ngraph::opset2::BatchToSpace>(node) ||
Expand All @@ -128,24 +141,64 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
// Disable shape inference (WA for generic operations)
::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);

// Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::InitNodeInfo>();
// WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
manager.register_pass<ngraph::pass::ConvertPriorBox>();
manager.register_pass<ngraph::pass::CommonOptimizations>();
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();

manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);

ngraph::pass::Manager ti_manager;
// Unroll will be called after all conversions
// temporarily switch back to plugin unroller from NGraph unroller until TI output names are corrected
// ti_manager.register_pass<ngraph::pass::UnrollTensorIterator>();
ti_manager.run_passes(nGraphFunc);
#ifndef USE_CNNNETWORK_LPT
bool enableInt8;
#endif

{
// Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::InitNodeInfo>();
// WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
manager.register_pass<ngraph::pass::ConvertPriorBox>();
manager.register_pass<ngraph::pass::CommonOptimizations>();
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();

manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);

#ifndef USE_CNNNETWORK_LPT
enableInt8 = config.enableInt8 && ngraph::pass::low_precision::LowPrecisionTransformer::isFunctionQuantized(nGraphFunc);
if (enableInt8) {
const auto fp16_callback = [&baselineIsFP16](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
if (!baselineIsFP16 && node->get_output_element_type(0) == ngraph::element::f16) {
baselineIsFP16 = true;
}

return true;
};

ngraph::pass::Manager conversion_manager;
// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
conversion_manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::f16, ngraph::element::f32);
conversion_manager.set_callback(fp16_callback);
conversion_manager.run_passes(nGraphFunc);
}
#endif
}

#ifndef USE_CNNNETWORK_LPT
using namespace ngraph::pass::low_precision;
if (enableInt8) {
auto params = LayerTransformation::Params(
true, // updatePrecisions
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
true); // supportAsymmetricQuantization
LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
.add<MatMulTransformation, ngraph::opset1::MatMul>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false)));

transformer.transform(nGraphFunc);
}
#endif

{
ngraph::pass::Manager manager = ngraph::pass::Manager();
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);
}

clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
}
Expand All @@ -157,6 +210,17 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
transformator.fullTrim();
}

if (baselineIsFP16) {
// [WA part1] Store 'lpt_back_to_fp16' flag to convert FP32 operations to original FP16 after LPT
InputsDataMap inputsMap;
clonedNetwork->getInputsInfo(inputsMap);

if (!inputsMap.empty()) {
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
input0.begin()->second->params["lpt_back_to_fp16"];
}
}

return clonedNetwork;
}

Expand Down Expand Up @@ -259,7 +323,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn

context = m_defaultContext;

return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), context, conf);
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), context, conf);
}

ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network,
Expand All @@ -283,7 +347,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
conf.max_dynamic_batch = static_cast<int>(network.getBatchSize());
}

return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), casted, conf);
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), casted, conf);
}

RemoteContext::Ptr clDNNEngine::CreateContext(const ParamMap& params) {
Expand Down Expand Up @@ -326,7 +390,7 @@ QueryNetworkResult clDNNEngine::QueryNetwork(const ICNNNetwork& network,
for (auto&& node : function->get_ops()) {
originalOps.emplace(node->get_friendly_name());
}
auto clonedNetwork = CloneAndTransformNetwork(network);
auto clonedNetwork = CloneAndTransformNetwork(network, _impl->m_config);
std::unordered_set<std::string> supported;
std::unordered_set<std::string> unsupported;

Expand Down
3 changes: 2 additions & 1 deletion inference-engine/src/cldnn_engine/cldnn_engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ class clDNNEngine : public InferenceEngine::InferencePluginInternal,
CLDNNRemoteCLContext::Ptr m_defaultContext;

cldnn::device_info GetDeviceInfo(const std::map<std::string, std::string> &config) const;
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const;
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network,
CLDNNPlugin::Config config) const;
public:
clDNNEngine();

Expand Down
65 changes: 41 additions & 24 deletions inference-engine/src/cldnn_engine/cldnn_program.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,11 @@
#include <sys/stat.h>
#include <exec_graph_info.hpp>

#ifdef USE_CNNNETWORK_LPT
#include "low_precision_transformations/transformer.hpp"
#include "low_precision_transformations/fully_connected.hpp"
#include "low_precision_transformations/gemm.hpp"
#endif

#include <iostream>
#include <iomanip>
Expand Down Expand Up @@ -397,6 +399,41 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
, p_currentOutputs({}) {
InitFormat(network);

bool fqFound = false;

bool baselineIsFP16 = false;
InputsDataMap inputsMap;
network.getInputsInfo(inputsMap);
if (!inputsMap.empty()) {
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
if (!input0.empty() && (input0.begin()->second->params.count("lpt_back_to_fp16") != 0)) {
baselineIsFP16 = true;
fqFound = true;
}
}

#ifdef USE_CNNNETWORK_LPT
bool allFQareSupported = true;
if (config.enableInt8) {
auto it = details::CNNNetworkIterator(&network);
auto end = details::CNNNetworkIterator();
while (it != end) {
auto& layer = *it;
if (layer->precision == Precision::FP16) {
baselineIsFP16 = true;
}

if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
fqFound = true;
auto levels = layer->GetParamAsUInt("levels");
if (levels != 255 && levels != 256) {
allFQareSupported = false;
}
}
it++;
}
}

if (config.enableInt8) {
auto params = LayerTransformation::Params(true, // updatePrecisions
true, // quantizeOutputs
Expand All @@ -413,38 +450,18 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
.add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
.add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");

bool fqFound = false;
bool allFQareSupported = true;
bool baselineIsFP16 = false;
{
auto it = details::CNNNetworkIterator(&network);
auto end = details::CNNNetworkIterator();
while (it != end) {
auto& layer = *it;
if (layer->precision == Precision::FP16) {
baselineIsFP16 = true;
}

if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
fqFound = true;
auto levels = layer->GetParamAsUInt("levels");
if (levels != 255 && levels != 256) {
allFQareSupported = false;
}
}
it++;
}
}

// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
if (fqFound && allFQareSupported) {
NetPass::ConvertPrecision(network, Precision::FP16, Precision::FP32);
}

LowPrecisionTransformer transformer(transforms);
transformer.transform(network);
}
#endif

// [WA part2] Try to find non-quantized layers and convert them back to FP16
// [WA part2] Try to find non-quantized layers and convert them back to FP16
if (config.enableInt8) {
if (fqFound && baselineIsFP16 && config.enable_fp16_for_quantized_models) {
auto layersSorted = BFSSort(network);

Expand Down
7 changes: 6 additions & 1 deletion inference-engine/src/gna_plugin/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,12 @@ target_compile_definitions(${TARGET_NAME}_test_static
INTEGER_LOW_P
USE_STATIC_IE)

target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_lp_transformations libGNA::API)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)

if (USE_CNNNETWORK_LPT)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_lp_transformations)
endif()

target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,17 @@ class INFERENCE_ENGINE_API_CLASS(Eltwise) : public Op {

Eltwise(const Output<Node>& data1,
const Output<Node>& data2,
const ELTWISE_TYPE eltwise_type);
const ELTWISE_TYPE eltwise_type,
const element::Type output_type = element::undefined);

void validate_and_infer_types() override;

std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

ELTWISE_TYPE eltwise_type;

private:
element::Type m_output_type;
};

} // namespace op
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,21 @@ class INFERENCE_ENGINE_API_CLASS(FullyConnected) : public Op {
FullyConnected(const Output<Node> & A,
const Output<Node> & B,
const Output<Node> & C,
const Shape & output_shape);
const Shape & output_shape,
const element::Type output_type = element::undefined);

void validate_and_infer_types() override;

std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

size_t get_out_size() { return m_output_size; }
size_t get_out_size() const { return m_output_size; }

element::Type get_output_type() const { return m_output_type; }

private:
size_t m_output_size = 0;
Shape m_output_shape = {};
element::Type m_output_type;
};

} // namespace op
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ class INFERENCE_ENGINE_API_CLASS(NormalizeIE) : public Op {
const Output<Node>& weights,
float eps,
bool across_spatial,
bool channel_shared);
bool channel_shared,
const ngraph::element::Type output_type);

float get_eps() const { return m_eps; }
bool get_channel_shared() const { return m_channel_shared;}
Expand All @@ -39,6 +40,7 @@ class INFERENCE_ENGINE_API_CLASS(NormalizeIE) : public Op {
float m_eps;
bool m_across_spatial;
bool m_channel_shared;
ngraph::element::Type m_output_type;
};

} // namespace op
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,16 @@ class INFERENCE_ENGINE_API_CLASS(PowerIE) : public Op {
const NodeTypeInfo& get_type_info() const override { return type_info; }

PowerIE(const Output<Node>& data_batch,
const float power, const float scale, const float shift);
const float power, const float scale, const float shift, const element::Type output_type = element::undefined);

void validate_and_infer_types() override;

std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

float scale, power, shift;

private:
element::Type m_output_type;
};

} // namespace op
Expand Down
Loading

0 comments on commit c2271da

Please sign in to comment.