[Op Conformance] Update compare accuracy function #21783

sbalandi · 2023-12-20T09:27:38Z

Details:

item1
...

Tickets:

CVS-124663

sbalandi · 2023-12-20T12:45:43Z

Part of PR: #21347 with changes in code of tests.
PR was reverted because of problem with conversion precision, I exclude these changes (furthermore these changes do not have a big impact) and rerun failed post merge check: https://openvino-ci.toolbox.iotg.sclab.intel.com/job/private-ci/job/ie/job/build-linux-macos_arm64/6276/

iefode · 2023-12-21T09:24:56Z

@sbalandi What is the status of the PR?

sbalandi · 2023-12-21T09:28:10Z

@sbalandi What is the status of the PR?

the failed check is passed, so I think we can merge it

iefode · 2023-12-21T10:14:34Z

@sbalandi what about failed macos jobs?

sbalandi · 2023-12-21T10:15:36Z

@sbalandi what about failed macos jobs?

I rerun it here based on that pr: https://openvino-ci.toolbox.iotg.sclab.intel.com/job/private-ci/job/ie/job/build-linux-macos_arm64/6276/ and it pass

iefode · 2024-01-09T07:41:49Z

src/tests/test_utils/common_test_utils/src/ov_tensor_utils.cpp

@@ -281,13 +294,33 @@ void compare(const ov::Tensor& expected,
    if (abs_threshold == std::numeric_limits<double>::max() && rel_threshold == std::numeric_limits<double>::max()) {
        if (sizeof(ExpectedT) == 1 || sizeof(ActualT) == 1) {
            abs_threshold = 1.;
+            rel_threshold = 1.;


@sbalandi Please check the changes on all platforms and skip tests per platform (in case some test is failed only on one platform)

iefode · 2024-01-09T07:42:55Z

In general, PR LGTM. Anyway, we should check all platforms

iefode · 2024-01-09T14:45:27Z

@eshoguli Please add a comment related to rel_threshold and 0 handling

eshoguli

please fix

eshoguli · 2024-01-11T14:13:41Z

src/tests/test_utils/common_test_utils/src/ov_tensor_utils.cpp

@@ -363,14 +409,14 @@ void compare(const ov::Tensor& expected,
            throw std::runtime_error(out_stream.str());
        }
        double abs = std::fabs(expected_value - actual_value);
-        double rel = expected_value ? (abs / std::fabs(expected_value)) : abs;
+        double rel = expected_value && !std::isinf(expected_value) ? (abs / std::fabs(expected_value)) : abs;


In my understanding: we can not use relative error if value is zero.

Imagine next situations:
case #1: expected = 0.0, actual = 0.1, rel = 0.1 <= only 10%, it's not correct
case #2: expected = 0.01 (slightly enlarged than in case #1), actual (the same) = 0.1, rel = abs(0.01 - 0.1)/0.01=9 <= 900%

In the case #2 difference is smaller, but rel is much larger then in case #1. It's not correct. Don't use relative error if value is zero.

Changed, rel error is calculated now as:

double rel = expected_value && actual_value && !std::isinf(expected_value) ? (abs / std::fabs(expected_value)) : 0;

iefode · 2024-01-19T18:59:27Z

@sbalandi Is PR ready to merge?

sbalandi · 2024-01-22T12:15:19Z

@sbalandi Is PR ready to merge?

not yet, post merge is failed and i try to understand why

### Details: - *Revert #22972 - *Partly revert #21783 ### Tickets: - *ticket-id*

ceciliapeng2011 · 2024-03-04T05:12:47Z

src/tests/test_utils/common_test_utils/src/ov_tensor_utils.cpp

        abs_error.update(abs, i);
        rel_error.update(rel, i);
    }
    abs_error.mean /= shape_size_cnt;
    rel_error.mean /= shape_size_cnt;

-    if (!(less_or_equal(abs_error.max, abs_threshold) && less_or_equal(rel_error.max, rel_threshold))) {
+    if (!(less_or_equal(abs_error.max, abs_threshold) || less_or_equal(rel_error.mean, rel_threshold))) {


@iefode This PR leads to some failures of plugin unit test pass. Would you please check again?

@ceciliapeng2011 In your case, please update your branch to latest master. Commit was reverted

usstq · 2024-03-22T12:20:14Z

src/tests/test_utils/common_test_utils/src/ov_tensor_utils.cpp

+            continue;
+        } else if ((std::isinf(expected_value) || expected_value <= min_type_expected) &&
+                   (std::isinf(actual_value) || actual_value <= min_type_actual)) {
+            continue;


std::numeric_limits<float>::min() is smallest positive number representable by float, here all negative values can pass the check, which is not correct behavior, maybe you mean std::numeric_limits<float>::lowest() ?

sbalandi requested review from a team as code owners December 20, 2023 09:27

github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin labels Dec 20, 2023

sbalandi assigned iefode Dec 20, 2023

iefode reviewed Jan 9, 2024

View reviewed changes

iefode changed the title ~~[apiConformance] Update compare accuracy function~~ [Op Conformance] Update compare accuracy function Jan 9, 2024

sbalandi force-pushed the accuracyNewTry2 branch from 1256b23 to 6358216 Compare January 9, 2024 15:08

github-actions bot removed the category: CPU OpenVINO CPU plugin label Jan 9, 2024

sbalandi mentioned this pull request Jan 10, 2024

LP Transformation tests use API 2.0 #21677

Merged

sbalandi force-pushed the accuracyNewTry2 branch from df230eb to bc33f4e Compare January 10, 2024 16:37

eshoguli reviewed Jan 11, 2024

View reviewed changes

sbalandi force-pushed the accuracyNewTry2 branch 7 times, most recently from 7958781 to 9fae912 Compare January 18, 2024 16:34

sbalandi requested review from a team as code owners January 18, 2024 16:34

github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 18, 2024

[Op Conformance] Update compare accuracy function

4533a71

sbalandi force-pushed the accuracyNewTry2 branch from 9fae912 to 4533a71 Compare January 18, 2024 16:35

github-actions bot removed the category: GPU OpenVINO GPU plugin label Jan 18, 2024

iefode approved these changes Jan 23, 2024

View reviewed changes

iefode merged commit 3872ccf into openvinotoolkit:master Jan 23, 2024
100 checks passed

sbalandi mentioned this pull request Feb 27, 2024

Revert PR 22972 #23110

Merged

github-merge-queue bot pushed a commit that referenced this pull request Feb 29, 2024

Revert PR 22972 (#23110)

45ce9e7

### Details: - *Revert #22972 - *Partly revert #21783 ### Tickets: - *ticket-id*

ceciliapeng2011 reviewed Mar 4, 2024

View reviewed changes

usstq reviewed Mar 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Op Conformance] Update compare accuracy function #21783

[Op Conformance] Update compare accuracy function #21783

sbalandi commented Dec 20, 2023 •

edited

Loading

sbalandi commented Dec 20, 2023

iefode commented Dec 21, 2023

sbalandi commented Dec 21, 2023

iefode commented Dec 21, 2023

sbalandi commented Dec 21, 2023

iefode Jan 9, 2024

iefode commented Jan 9, 2024

iefode commented Jan 9, 2024

eshoguli left a comment

eshoguli Jan 11, 2024 •

edited

Loading

sbalandi Jan 16, 2024

iefode commented Jan 19, 2024

sbalandi commented Jan 22, 2024

ceciliapeng2011 Mar 4, 2024

iefode Mar 4, 2024

usstq Mar 22, 2024

[Op Conformance] Update compare accuracy function #21783

[Op Conformance] Update compare accuracy function #21783

Conversation

sbalandi commented Dec 20, 2023 • edited Loading

Details:

Tickets:

sbalandi commented Dec 20, 2023

iefode commented Dec 21, 2023

sbalandi commented Dec 21, 2023

iefode commented Dec 21, 2023

sbalandi commented Dec 21, 2023

iefode Jan 9, 2024

Choose a reason for hiding this comment

iefode commented Jan 9, 2024

iefode commented Jan 9, 2024

eshoguli left a comment

Choose a reason for hiding this comment

eshoguli Jan 11, 2024 • edited Loading

Choose a reason for hiding this comment

sbalandi Jan 16, 2024

Choose a reason for hiding this comment

iefode commented Jan 19, 2024

sbalandi commented Jan 22, 2024

ceciliapeng2011 Mar 4, 2024

Choose a reason for hiding this comment

iefode Mar 4, 2024

Choose a reason for hiding this comment

usstq Mar 22, 2024

Choose a reason for hiding this comment

sbalandi commented Dec 20, 2023 •

edited

Loading

eshoguli Jan 11, 2024 •

edited

Loading