Es/lpt/lpt to ngraph fixes2 with master (openvinotoolkit#2671)

* [LPT] Replace creation of dequantization with factory * [ngraph][LPT] Add ScaleShift replace for dequantization operations * [LPT] SubtractMultiplyToMultiplyAdd refactoring * [LPT] Code style fix * [LPT] Edit SubtractMultiplyToMultiplyAdd transformation for dequantization * [LPT] Linux compilation quick fix * [LPT] [WIP] runtime info applying * [LPT] Concat transformation functional tests extending * [LPT] MultiplyToConvolution + Subtract to add fusing + improvements in LowPrecisionTransformer * [LPT] linux compilation error fix * [LPT] compilation error * [LPT] MultiplyToGroupConvolution fix: 5D support * [LPT] Multiply transformation extending: FQ weights support - wip * [LPT] FQ folding & precision selection * [LPT] code style fixes * [LPT] code style fixes * [LPT] Linux compilation error fix * [LPT] SubtractMultiplyToMultiplyAdd: refactoring * [LPT] Tests fixes * [LPT] MultiplyToGroupConvolution tests * [LPT] Convert subtract with int inputs to Eltwise sub * [LPT] Constant folding fix for quant models * [LPT] 1) Asymmetric quantization improvement 2) tests extending * [LPT] 2 fixes for se_resnext_50 * [LPT] Add transformation priority branch selection test * [LPT] AddMultiplyFusion: legacy transformation quick fix * [LPT] nGraph tests temporary disabling * [LPT] Fix for eltwise inputs with multiple outputs * [LPT] Fix for FQ fuse * [LPT] Reshape by channel, batch temporary disabled * [nGraph][LPT] MatMul fix for reading FP16 models * [LPT] 1) Add (not after Convolution/GroupConvolution/MatMul with Constant) to Subtract 2) precision selection fix: MultiplyToGroupConvolution quick fix * [LPT] DenseNet improvments: AddTransformation: Add to Subtract + tests * [LPT] AddTransformarion refactoring * [LPT] AddTransformation tests temporay disabled * [LPT] ReshapeTransformation improvements: degradation fix * [LPT] code style fix * [LPT] Concat tests temporary disabling * [LPT] tests unification 1) plugin tests: added test-cases and nGraph-validation for clamp, split and variadic split 2) func tests: added test-cases 3) transformNGraph: added the ability to run additional transformations * [LPT] split & variadic split merge fix * [LPT] Clamp: added support for asymmetric quantization * [LPT] added DequantizationAttr run-time attribute * [LPT] debug info removal * [LPT] ConcatTransformation: zero point fix * [LPT] CNNNetwork ReLU transformation quick fix * [LPT] 1) Concat fix 2) ConcatMultiChannels fix 3) Added "Concat with Split" test-cases 4) Subgraph fix * [LPT] 1) Concat fix 2) Added "Concat with different precision on childs" test-case * [LPT] concat fix Ubuntu18 * [LPT] Concat test fixes * [LPT] Not fp32 FQ input support * [LPT] MatMul Fix + separateInStandaloneBranch Fix * [LPT] Fix reference input types in mish fusion tests * [LPT] Fix cpuFuncTests on CentOS building * [nGraph][LPT] ScaleShift 2d, 3d nGraph conversion enabling * [LPT] 1) FullyConnected workaround removing 2) validate_nodes_and_infer_types for LPT * [ngraph] Add check for childs for ConvertSubtract * [LPT] Squeeze/Unsqueeze tests unification * [LPT] Squeeze/Unsqueeze change signature for getReference/getOriginal * [LPT] Mul & Add -> ScaleShift quick fix * [LPT] nGraph tests emporary disabling * [LPT] code style fix * [LPT] code style fix #2 * [LPT] nGraph tests temporary disabling * [LPT] code styl fix #3 * [LPT] shared plugin tests temporary disabling * [LPT] cleanup * [LPT] nGraph unit_tests tests temproary disabling * [LPT] nGraph unit tests disabling #2 * [LPT] nGraph tests disabling * [LPT] nGraph tests temporary disabling * [LPT] WA removing * [LPT] CentOS compilation fix * [LPT] KMB wa to avoid compilation error * [LPT] functional test temporary disabling * [nGraph] code style fixes * [LPT] ConcatTransformation: data movement operation as intermediate handling * [LPT] FuseSubtractToFakeQuantize after VariadicSplit * [LPT] ConcatWithSplitTransformation functional test temporary disabling * [LPT] Clamp and ConcatWithDifferentPrecisionsOnChilds: tests fix * [LPT] MatMul: bert-nv-mlperf-quantized fix * [LPT] Add to convolution biases fuse fix * [LPT] GPU plugin tests fixes * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] CLDNN Plugin FP16 conversion * [LPT] AvgPool update precision if there is not FQ after + convolution precision limitation on activation * [LPT] Convolution fixes * [LPT] FuseSubtractToFakequantize & FuseMultiplyToFakeQuantize improvement * [LPT] FuseSubtractToFakeQuantize test fix * [LPT] FuseSubtractToFakeQuantizeTransformation tests * [LPT] code style fix * [LPT] AvgPool child recursive extend * [LPT] AvgPool tests + fix * [LPT] compilation quick fix * [LPT] Add to convolution biases fuse fix * [LPT] Linux issues: MatMulWithOptimizedConstantFakeQuantizeTransformation temporary disabled * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] 1) added the ability to create sub without dequantizationAttribute 2) fixed optimizeMulAfter: added copying rt_info 3) Tests Unification: Convolution transformation 4) added cleanRunTimeInfo into Network Helper * [LPT] Tests Unification: GroupConvolution * [LPT] removed debug info * [LPT] functional tests for Convolution & GroupConvolution extending * [LPT] [MatMul] Quick fix ubuntu error * [LPT] MatMulTransformation quick test fix: one constant for both intervals * [nGraph] code style fix * [LPT] added output_precision to NormalizeIE * [nGraph] NormalizeIE fix for LPT support * [LPT] nGraph WA removal * [LPT] fixed fillSubgraph for concat multi channels * [LPT] MatMul fix * [nGraph] WA removal: 1) nGraph tests enabling 2) LPT extanding: not handle in FP32 * [LPT] nGraph WA removal: function tests skip config rollback * [LPT] WA removal: precision propagation fix * [LPT] ConvertMulOrAddFinally transformation extending * [nGraph] ConvolutionMultiplyFusion rollback (move from legacy to common) * [nGraph] ConvertMulAddToScaleShiftOrPower: WA removal * [nGraph] TypeRelaxed: WA removal * [nGraph] WA removal: TypeRelaxed * [LPT] WA removal: ConcatTransformation * [nGraph] WA removal: Eltwise & ConvertMulOrAddFinally fixes to support LPT * [nGraph] MulAddConversion fix: 2D & 3D ScaleShift are supproted * [nGraph] VisualizeTree extending * [LPT] FakeQuantizeDequantization extending: check element wise dequantization operation * [LPT] FakeQuantizeDequantization extending: SubtractMultiplyToMultiplyAddTransformation & WeightableLayerTransformation * [LPT] Convolution + test infrastructure update * [LPT] GPU compilation error * [nGraph] BatchNorm plugin tests: input tensor definition * [LPT] LowPrecisionTransformer::isFunctionQuantized was added * [nGraph] WA final cleanup * [nGraph] ScaleShiftIE quick fix * [LPT] Functional tests: added test-cases "Concat with intermediate with constant" * [LPT] Transformer::isNetworkquantized fix * [LPT] SubtractMultiplyToMultiplyAdd zero Add remove: fix for ssd300 on gpu * [LPT] MultiplyToGroupConvolution not transform on Const * [LPT] workaround for negative scales * [LPT] Convert standalone dequantization Mul,Sub,Add to ScaleShift * [LPT] SubtractMultiplyToMultiplyAdd test fix * [LPT] Clamp transformation: GPU tests fix * [LPT] Transformer tests * [LPT] FakeQuantizePrecisionSelectionTransformation was disabled for GPU * [LPT] TransformerIsFunctionQuantized refactoring * [nGraph] code style fix * [LPT] mobilenet_v2_tf_depthwise test update * [LPT] TMP: dequantization folding * [LPT] Elementwise transformation fix: dequantization operations constant folding * [LPT] cleanup * [LPT] denormal values fix * [LPT] FuseFakeQuantize test fixed + negative multiply case * [LPT] FP32 -> FP16 conversion info * [LPT] FQ dot interval support + swapMultiplyAdd safely division * [LPT] test fix * [LPT] Tests for dot interval on FQ + tests for addTransformation enabling * [LPT] Clamp transformation fix * [LPT] FQ prec selection test fix * [LPT] Clamp test case * [LPT] Concat division precision fix * [LPT] cleanup * [LPT] merge fix * [LPT] WIP: MatMul asymmetric quantization fix (BERT) * [LPT] MatMulWithOptimizedConstantFakeQuantizeTransformation disabled * [LPT] GPU Plugin set config fix * [LPT] Fix merge mistakes * [LPT] Rollback device specific INT8 * [LPT] ReshapeFullyConnected fix: FullyConnected output fix * [LPT] bert-base-chinese GPU fix * [ngraph/LPT] Tests for fix convert_mul_or_add_finally with dequantization [ngraph/LPT] Fix convert mul_or_add_finally with dequantization * [LPT] ScaleShift dim < 4 only dequantization conversion * [LPT] MatMul transformation tests extensing * [LPT] ReshapeFullyConnected legacy transformation: LPT test case addition * [nGraph] VisualizeTree extending: property names displying to simplify search * [LPT] getDequantization extending * [LPT] MulAddToScaleshiftOrPower: out precision fix & tests * [LPT] Multiply to ScaleShiftIE: Multiply transformation: remove DEQUANTIZATION if not valid * [LPT] Concat test case * [nGraph] try to fix opencv compatibility * [nGraph] nGraph code style fix * [LPT] InPlace dequantization folding * [LPT] Multiply constant folding test * [LPT] Fix plugin test case for MatMulWithOptimizedConstantFakeQuantize [LPT] Enable MatMulWithOptimizedConstantFakeQuantize plugin test * [LPT] Convolution transformation: mulConst shape fix * [LPT] INT8 Constant folding branch for elementwise ops optimization removal * [LPT] eltwise for const branch fix * [LPT] linux fix * [LPT] Multiply test refactoring * [LPT] Convert Fuse in Constant + tests * [LPT] function comparation: runtime info comparation rollback * [LPT] linux build fix * [LPT] linux build fix2 * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT * [LPT] Reshape transformation update: don't broadcast by batch * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT - refactoring * [LPT] MatMul transformation: transpose input tensors fix * [LPT] checkElementwise for AddTransformation WA: should be moved to getDequantization * [LPT] merge fix * [LPT] MatMul fix & tests * [LPT] AddTransformation tests * [LPT] Interpolate transformation enabled * [LPT] constant folding before LPT * [LPT] WIP: not completed tests * [LPT] GPU degradation fix * [LPT] FuseConvert workaround * [LPT] code cleanup * [LPT] Interpolate GPU test quick fix * [LPT] GroupConvolution fix * [LPT] Fix fusing multiply for non-dequantization layers * [LPT] GPU pipeline update: enableInt8 initialization place update * [LPT] tests compilation fix * [LPT] merge fix * [LPT] tests enabling * [LPT] merge issue resolving * [LPT] LPT CNNNetwork usage macros: part #1: source code * [LPT] LPT CNNNetwork usage macros: part #2: cmake files update and tests addoption * [LPT] LPT workaround from nGraph core removing * [LPT] previous LPT version tests * [LPT] inference_engine_lp_transformations was returned back * [LPT] replace_node rollback * [LPT] ConvertSubtract fix * [LPT] GPU: baselineIsFP16 reuse fix * [LPT] FakeQuantizeTransformation: GPU workaround: I32 -> FP32 Convert is not fused * [LPT] AvgPool output precision workaround * [LPT] Group convolution precision + Subtract to ScaleShift const fix * [LPT] SubMulToMulAdd & Transpose: action-recognition-0001 fix * [LPT] Transpose: added test with per-tensor quantization Co-authored-by: Aleksandr Pertovsky <[email protected]> Co-authored-by: Zinoviev, Vladimir <[email protected]> Co-authored-by: Vladislav Golubev <[email protected]> Co-authored-by: Gorokhov Dmitriy <[email protected]>
vgavrilo · Oct 23, 2020 · c2271da · c2271da
1 parent ca95240
commit c2271da
Show file tree

Hide file tree

Showing 537 changed files with 37,328 additions and 2,422 deletions.
diff --git a/inference-engine/src/cldnn_engine/CMakeLists.txt b/inference-engine/src/cldnn_engine/CMakeLists.txt
@@ -21,9 +21,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
               SOURCES ${MAIN_SRC} ${LIBRARY_HEADERS}
               VERSION_DEFINES_FOR cldnn_engine.cpp)
 
-target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
+target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
                                              clDNN_lib pugixml inference_engine_transformations)
 
+if (USE_CNNNETWORK_LPT)
+        target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
+endif()
+
 set (CLDNN_TOP_FOLDER ${IE_MAIN_SOURCE_DIR}/thirdparty/clDNN)
 target_include_directories(${TARGET_NAME} PRIVATE
         ${CMAKE_CURRENT_SOURCE_DIR}

diff --git a/inference-engine/src/cldnn_engine/cldnn_engine.cpp b/inference-engine/src/cldnn_engine/cldnn_engine.cpp
@@ -34,7 +34,9 @@
 #include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
 #include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
 #include <transformations/init_node_info.hpp>
+#include <transformations/convert_precision.hpp>
 #include <transformations/rt_info/fused_names_attribute.hpp>
+
 #include <legacy/convert_function_to_cnn_network.hpp>
 #include <legacy/ie_util_internal.hpp>
 #include <legacy/graph_transformer.h>
@@ -43,6 +45,9 @@
 #include "cldnn_executable_network.h"
 #include "cldnn_custom_layer.h"
 
+#include <transformations/low_precision/transformer.hpp>
+#include <transformations/low_precision/mat_mul.hpp>
+
 #ifdef __linux__
 #include <dlfcn.h>
 #endif
@@ -73,8 +78,10 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
     return device_info;
 }
 
-InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const {
+InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network, CLDNNPlugin::Config config) const {
     std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
+    bool baselineIsFP16 = false;
+
     if (clonedNetwork->getFunction()) {
         const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
             // Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
@@ -113,6 +120,12 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
                 return can_use_reduce;
             }
 
+            if (auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(node)) {
+                return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
+                       ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
+                       ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
+            }
+
             return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
                    std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node) ||
                    std::dynamic_pointer_cast<const ::ngraph::opset2::BatchToSpace>(node) ||
@@ -128,24 +141,64 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
         // Disable shape inference (WA for generic operations)
         ::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);
 
-        // Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
-        ngraph::pass::Manager manager;
-        manager.register_pass<ngraph::pass::InitNodeInfo>();
-        // WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
-        manager.register_pass<ngraph::pass::ConvertPriorBox>();
-        manager.register_pass<ngraph::pass::CommonOptimizations>();
-        manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
-        manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
-        manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
-
-        manager.set_callback(transformations_callback);
-        manager.run_passes(nGraphFunc);
-
-        ngraph::pass::Manager ti_manager;
-        // Unroll will be called after all conversions
-        // temporarily switch back to plugin unroller from NGraph unroller until TI output names are corrected
-        // ti_manager.register_pass<ngraph::pass::UnrollTensorIterator>();
-        ti_manager.run_passes(nGraphFunc);
+#ifndef USE_CNNNETWORK_LPT
+        bool enableInt8;
+#endif
+
+        {
+            // Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
+            ngraph::pass::Manager manager;
+            manager.register_pass<ngraph::pass::InitNodeInfo>();
+            // WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
+            manager.register_pass<ngraph::pass::ConvertPriorBox>();
+            manager.register_pass<ngraph::pass::CommonOptimizations>();
+            manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
+            manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
+
+            manager.set_callback(transformations_callback);
+            manager.run_passes(nGraphFunc);
+
+#ifndef USE_CNNNETWORK_LPT
+            enableInt8 = config.enableInt8 && ngraph::pass::low_precision::LowPrecisionTransformer::isFunctionQuantized(nGraphFunc);
+            if (enableInt8) {
+                const auto fp16_callback = [&baselineIsFP16](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
+                    if (!baselineIsFP16 && node->get_output_element_type(0) == ngraph::element::f16) {
+                        baselineIsFP16 = true;
+                    }
+
+                    return true;
+                };
+
+                ngraph::pass::Manager conversion_manager;
+                // [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
+                conversion_manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::f16, ngraph::element::f32);
+                conversion_manager.set_callback(fp16_callback);
+                conversion_manager.run_passes(nGraphFunc);
+            }
+#endif
+        }
+
+#ifndef USE_CNNNETWORK_LPT
+        using namespace ngraph::pass::low_precision;
+        if (enableInt8) {
+            auto params = LayerTransformation::Params(
+                true,  // updatePrecisions
+                LayerTransformation::QuantizedTensorAlignment::UpdateLevel,  // quantizedTensorAlignmentOnActivations
+                LayerTransformation::QuantizedTensorAlignment::None,  // quantizedTensorAlignmentOnWeights
+                true);  // supportAsymmetricQuantization
+            LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
+                .add<MatMulTransformation, ngraph::opset1::MatMul>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false)));
+
+            transformer.transform(nGraphFunc);
+        }
+#endif
+
+        {
+            ngraph::pass::Manager manager = ngraph::pass::Manager();
+            manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
+            manager.set_callback(transformations_callback);
+            manager.run_passes(nGraphFunc);
+        }
 
         clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
     }
@@ -157,6 +210,17 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
         transformator.fullTrim();
     }
 
+    if (baselineIsFP16) {
+        // [WA part1] Store 'lpt_back_to_fp16' flag to convert FP32 operations to original FP16 after LPT
+        InputsDataMap inputsMap;
+        clonedNetwork->getInputsInfo(inputsMap);
+
+        if (!inputsMap.empty()) {
+            auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
+            input0.begin()->second->params["lpt_back_to_fp16"];
+        }
+    }
+
     return clonedNetwork;
 }
 
@@ -259,7 +323,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
 
     context = m_defaultContext;
 
-    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), context, conf);
+    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), context, conf);
 }
 
 ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network,
@@ -283,7 +347,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
         conf.max_dynamic_batch = static_cast<int>(network.getBatchSize());
     }
 
-    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), casted, conf);
+    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), casted, conf);
 }
 
 RemoteContext::Ptr clDNNEngine::CreateContext(const ParamMap& params) {
@@ -326,7 +390,7 @@ QueryNetworkResult clDNNEngine::QueryNetwork(const ICNNNetwork& network,
         for (auto&& node : function->get_ops()) {
             originalOps.emplace(node->get_friendly_name());
         }
-        auto clonedNetwork = CloneAndTransformNetwork(network);
+        auto clonedNetwork = CloneAndTransformNetwork(network, _impl->m_config);
         std::unordered_set<std::string> supported;
         std::unordered_set<std::string> unsupported;
 

diff --git a/inference-engine/src/cldnn_engine/cldnn_engine.h b/inference-engine/src/cldnn_engine/cldnn_engine.h
@@ -27,7 +27,8 @@ class clDNNEngine : public InferenceEngine::InferencePluginInternal,
     CLDNNRemoteCLContext::Ptr m_defaultContext;
 
     cldnn::device_info GetDeviceInfo(const std::map<std::string, std::string> &config) const;
-    InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const;
+    InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network,
+                                                               CLDNNPlugin::Config config) const;
 public:
     clDNNEngine();
 

diff --git a/inference-engine/src/cldnn_engine/cldnn_program.cpp b/inference-engine/src/cldnn_engine/cldnn_program.cpp
@@ -88,9 +88,11 @@
 #include <sys/stat.h>
 #include <exec_graph_info.hpp>
 
+#ifdef USE_CNNNETWORK_LPT
 #include "low_precision_transformations/transformer.hpp"
 #include "low_precision_transformations/fully_connected.hpp"
 #include "low_precision_transformations/gemm.hpp"
+#endif
 
 #include <iostream>
 #include <iomanip>
@@ -397,6 +399,41 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
     , p_currentOutputs({}) {
     InitFormat(network);
 
+    bool fqFound = false;
+
+    bool baselineIsFP16 = false;
+    InputsDataMap inputsMap;
+    network.getInputsInfo(inputsMap);
+    if (!inputsMap.empty()) {
+        auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
+        if (!input0.empty() && (input0.begin()->second->params.count("lpt_back_to_fp16") != 0)) {
+            baselineIsFP16 = true;
+            fqFound = true;
+        }
+    }
+
+#ifdef USE_CNNNETWORK_LPT
+    bool allFQareSupported = true;
+    if (config.enableInt8) {
+        auto it = details::CNNNetworkIterator(&network);
+        auto end = details::CNNNetworkIterator();
+        while (it != end) {
+            auto& layer = *it;
+            if (layer->precision == Precision::FP16) {
+                baselineIsFP16 = true;
+            }
+
+            if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
+                fqFound = true;
+                auto levels = layer->GetParamAsUInt("levels");
+                if (levels != 255 && levels != 256) {
+                    allFQareSupported = false;
+                }
+            }
+            it++;
+        }
+    }
+
     if (config.enableInt8) {
         auto params = LayerTransformation::Params(true,  // updatePrecisions
                                                   true,  // quantizeOutputs
@@ -413,38 +450,18 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
                 .add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
                 .add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");
 
-        bool fqFound = false;
-        bool allFQareSupported = true;
-        bool baselineIsFP16 = false;
-        {
-            auto it = details::CNNNetworkIterator(&network);
-            auto end = details::CNNNetworkIterator();
-            while (it != end) {
-                auto& layer = *it;
-                if (layer->precision == Precision::FP16) {
-                    baselineIsFP16 = true;
-                }
-
-                if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
-                    fqFound = true;
-                    auto levels = layer->GetParamAsUInt("levels");
-                    if (levels != 255 && levels != 256) {
-                        allFQareSupported = false;
-                    }
-                }
-                it++;
-            }
-        }
-
         // [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
         if (fqFound && allFQareSupported) {
             NetPass::ConvertPrecision(network, Precision::FP16, Precision::FP32);
         }
 
         LowPrecisionTransformer transformer(transforms);
         transformer.transform(network);
+    }
+#endif
 
-        // [WA part2] Try to find non-quantized layers and convert them back to FP16
+    // [WA part2] Try to find non-quantized layers and convert them back to FP16
+    if (config.enableInt8) {
         if (fqFound && baselineIsFP16 && config.enable_fp16_for_quantized_models) {
             auto layersSorted = BFSSort(network);
 

diff --git a/inference-engine/src/gna_plugin/CMakeLists.txt b/inference-engine/src/gna_plugin/CMakeLists.txt
@@ -57,7 +57,12 @@ target_compile_definitions(${TARGET_NAME}_test_static
             INTEGER_LOW_P
             USE_STATIC_IE)
 
-target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_lp_transformations libGNA::API)
+target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
+
+if (USE_CNNNETWORK_LPT)
+        target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_lp_transformations)
+endif()
+
 target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
 set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)
 

diff --git a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/eltwise.hpp b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/eltwise.hpp
@@ -22,13 +22,17 @@ class INFERENCE_ENGINE_API_CLASS(Eltwise) : public Op {
 
     Eltwise(const Output<Node>& data1,
             const Output<Node>& data2,
-            const ELTWISE_TYPE eltwise_type);
+            const ELTWISE_TYPE eltwise_type,
+            const element::Type output_type = element::undefined);
 
     void validate_and_infer_types() override;
 
     std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
 
     ELTWISE_TYPE eltwise_type;
+
+private:
+    element::Type m_output_type;
 };
 
 }  // namespace op

diff --git a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/fully_connected.hpp b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/fully_connected.hpp
@@ -29,17 +29,21 @@ class INFERENCE_ENGINE_API_CLASS(FullyConnected) : public Op {
     FullyConnected(const Output<Node> & A,
                    const Output<Node> & B,
                    const Output<Node> & C,
-                   const Shape & output_shape);
+                   const Shape & output_shape,
+                   const element::Type output_type = element::undefined);
 
     void validate_and_infer_types() override;
 
     std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
 
-    size_t get_out_size() { return m_output_size; }
+    size_t get_out_size() const { return m_output_size; }
+
+    element::Type get_output_type() const { return m_output_type; }
 
 private:
     size_t m_output_size = 0;
     Shape m_output_shape = {};
+    element::Type m_output_type;
 };
 
 }  // namespace op

diff --git a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/normalize_ie.hpp b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/normalize_ie.hpp
@@ -25,7 +25,8 @@ class INFERENCE_ENGINE_API_CLASS(NormalizeIE) : public Op {
                 const Output<Node>& weights,
                 float eps,
                 bool across_spatial,
-                bool channel_shared);
+                bool channel_shared,
+                const ngraph::element::Type output_type);
 
     float get_eps() const { return m_eps; }
     bool get_channel_shared() const  { return m_channel_shared;}
@@ -39,6 +40,7 @@ class INFERENCE_ENGINE_API_CLASS(NormalizeIE) : public Op {
     float m_eps;
     bool m_across_spatial;
     bool m_channel_shared;
+    ngraph::element::Type m_output_type;
 };
 
 }  // namespace op

diff --git a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/power.hpp b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/power.hpp
@@ -19,13 +19,16 @@ class INFERENCE_ENGINE_API_CLASS(PowerIE) : public Op {
     const NodeTypeInfo& get_type_info() const override { return type_info; }
 
     PowerIE(const Output<Node>& data_batch,
-            const float power, const float scale, const float shift);
+            const float power, const float scale, const float shift, const element::Type output_type = element::undefined);
 
     void validate_and_infer_types() override;
 
     std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
 
     float scale, power, shift;
+
+private:
+    element::Type m_output_type;
 };
 
 }  // namespace op