DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

mingmingtasd · 2022-04-22T07:28:50Z

I find that the microsoft.ai.directml.1.7.0 and microsoft.ai.directml.1.8.0 will cause some conv2d cases get wrong result.
I provide one case to reproduce the issue, please refer to my branch for the sample code. Depend on PR#232 which can fix build issue when doing python setup.py install.

The text was updated successfully, but these errors were encountered:

huningxin · 2022-04-22T07:51:19Z

@mingmingtasd , could you please elaborate what your case tests against? e.g. showing the test code here probably is helpful (I found your test case is not big). And as you are using python code which has multiple layers above DirectML API, say DirectMLX, PyDirectML, are you able to investigate which layer causes this issue? Could you please verify whether your case would fail at DirectML API by C/C++ code? Do other PyDirectML samples work on the 1.7.0 and 1.8.0?

mingmingtasd · 2022-04-22T08:35:48Z

The case below is for a Depthwise Conv2d, with group=4, input layout = nchw, filter layout = oihw.
The correct result should be [6010, 7046, 11000, 9000],
microsoft.ai.directml.1.5.1 and microsoft.ai.directml.1.6.0 got the same correct result,
but microsoft.ai.directml.1.7.0 and microsoft.ai.directml.1.8.0 actually got wrong result as [6010, 7000, 8000, 9000];

input_data = [10, 10, 10, 10, 21, 22, 23, 24, 10, 20, 30, 40, 0, 0, 0, 0]
input_data_array = np.array(input_data, np.float32)

weight_data = [0.25, 0.25, 0.25, 0.25, 0.0, 1.0, 0.0, 1.0, 10.0, 20.0, 30.0, 40.0, 50.0,
               50.0, 50.0, 50.0]
weight_data_array = np.array(weight_data, np.float32)

bias_data = [6000, 7000, 8000, 9000]
bias_data_array = np.array(bias_data, np.float32)

input_bindings = []
def append_input_tensor(builder: dml.GraphBuilder, input_bindings: list, input_tensor: dml.TensorDesc, tensor_data_array):
    tensor = dml.input_tensor(builder, len(input_bindings), input_tensor)
    input_bindings.append(dml.Binding(tensor, tensor_data_array))
    return tensor

device = dml.Device(True, True)
builder = dml.GraphBuilder(device)
data_type = dml.TensorDataType.FLOAT32
input = dml.input_tensor(builder, 0, dml.TensorDesc(data_type, [1, 4, 2, 2]))
flags = dml.TensorFlags.OWNED_BY_DML
input_bindings.append(dml.Binding(input, input_data_array))
convolution_weight = append_input_tensor(builder, input_bindings, dml.TensorDesc(
    data_type, flags, [4, 1, 2, 2]), weight_data_array)
convolution_bias = append_input_tensor(builder, input_bindings, dml.TensorDesc(
    data_type, flags, [1, 4, 1, 1]), bias_data_array)
convolution = dml.convolution(input, convolution_weight, convolution_bias, strides=[
                              1, 1], start_padding=[0, 0], end_padding=[0, 0], group_count=4)
op = builder.build(dml.ExecutionFlags.NONE, [convolution])
output_data = device.compute(op, input_bindings, [convolution])
output_tensor = np.array(output_data[0], np.float32)
print(output_tensor)

I validated with DirectML C/C++ API directly without using DirectMLX and PyDirectML, this conv2d issue will still occur. @huningxin

Do other PyDirectML samples work on the 1.7.0 and 1.8.0
Absolutely there are cases which can pass on the 1.7.0 and 1.8.0, I validated. And some other Depthwise Conv2d cases can pass. We should treat case by case, I have not found out the common point for these failed cases so far, so I need help from DirectML. @huningxin

Jamather · 2022-04-25T21:20:44Z

Can you provide us some information about your hardware and drivers so we can reproduce the issue?
Also, does the issue occur if you use DML_EXECUTION_FLAG_DISABLE_META_COMMANDS?
Thanks!

mingmingtasd · 2022-04-26T02:43:06Z

Can you provide us some information about your hardware and drivers so we can reproduce the issue? Also, does the issue occur if you use DML_EXECUTION_FLAG_DISABLE_META_COMMANDS? Thanks!

I tried with op = builder.build(dml.ExecutionFlags.DISABLE_META_COMMANDS, [convolution]), the issue still exists. The info of HW and drivers:
Device name DESKTOP-KF70OVD
Processor Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 3.60 GHz
Installed RAM 32.0 GB (31.9 GB usable)
Device ID DACF1A09-470F-4B31-AA2D-3A8256124C7A
Product ID 00330-80000-00000-AA440
System type 64-bit operating system, x64-based processor
Pen and touch Pen and touch support with 10 touch points

Edition Windows 10 Pro
Version 21H1
Installed on ‎2/‎11/‎2022
OS build 19043.1645
Experience Windows Feature Experience Pack 120.2212.4170.0

NVIDIA GeForce GT 1030
driver version: 27.21.14.5671
@Jamather Thanks a lot!

fdwr · 2022-05-06T01:26:09Z

@mingmingtasd Fix in https://www.nuget.org/packages/Microsoft.AI.DirectML/1.8.2. Thanks to @Jamather for diagnosing and fixing it.

mingmingtasd · 2022-05-06T05:40:48Z

Thanks so much! I verified yet, close this issue. @Jamather @fdwr @huningxin

mingmingtasd mentioned this issue Apr 22, 2022

Upgrade DirectML version to 1.6.0 webmachinelearning/webnn-native#237

Merged

Jamather added the bug Something isn't working label Apr 25, 2022

Jamather self-assigned this Apr 25, 2022

mingmingtasd closed this as completed May 6, 2022

mingmingtasd mentioned this issue May 6, 2022

Upgrade DirectML version to 1.8.2(latest) webmachinelearning/webnn-native#255

Merged

fdwr mentioned this issue May 6, 2022

Update DirectML from 1.8.0 to 1.8.2 for ORT 1.12 microsoft/onnxruntime#11459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

mingmingtasd commented Apr 22, 2022

huningxin commented Apr 22, 2022

mingmingtasd commented Apr 22, 2022 •

edited

Loading

Jamather commented Apr 25, 2022 •

edited

Loading

mingmingtasd commented Apr 26, 2022

fdwr commented May 6, 2022 •

edited

Loading

mingmingtasd commented May 6, 2022

DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

Comments

mingmingtasd commented Apr 22, 2022

huningxin commented Apr 22, 2022

mingmingtasd commented Apr 22, 2022 • edited Loading

Jamather commented Apr 25, 2022 • edited Loading

mingmingtasd commented Apr 26, 2022

fdwr commented May 6, 2022 • edited Loading

mingmingtasd commented May 6, 2022

mingmingtasd commented Apr 22, 2022 •

edited

Loading

Jamather commented Apr 25, 2022 •

edited

Loading

fdwr commented May 6, 2022 •

edited

Loading