Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectML version 1.7.0 /1.8.0 cause some conv2d cases get wrong result #234

Closed
mingmingtasd opened this issue Apr 22, 2022 · 6 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@mingmingtasd
Copy link
Contributor

I find that the microsoft.ai.directml.1.7.0 and microsoft.ai.directml.1.8.0 will cause some conv2d cases get wrong result.
I provide one case to reproduce the issue, please refer to my branch for the sample code. Depend on PR#232 which can fix build issue when doing python setup.py install.

@huningxin
Copy link
Contributor

@mingmingtasd , could you please elaborate what your case tests against? e.g. showing the test code here probably is helpful (I found your test case is not big). And as you are using python code which has multiple layers above DirectML API, say DirectMLX, PyDirectML, are you able to investigate which layer causes this issue? Could you please verify whether your case would fail at DirectML API by C/C++ code? Do other PyDirectML samples work on the 1.7.0 and 1.8.0?

@mingmingtasd
Copy link
Contributor Author

mingmingtasd commented Apr 22, 2022

The case below is for a Depthwise Conv2d, with group=4, input layout = nchw, filter layout = oihw.
The correct result should be [6010, 7046, 11000, 9000],
microsoft.ai.directml.1.5.1 and microsoft.ai.directml.1.6.0 got the same correct result,
but microsoft.ai.directml.1.7.0 and microsoft.ai.directml.1.8.0 actually got wrong result as [6010, 7000, 8000, 9000];

input_data = [10, 10, 10, 10, 21, 22, 23, 24, 10, 20, 30, 40, 0, 0, 0, 0]
input_data_array = np.array(input_data, np.float32)

weight_data = [0.25, 0.25, 0.25, 0.25, 0.0, 1.0, 0.0, 1.0, 10.0, 20.0, 30.0, 40.0, 50.0,
               50.0, 50.0, 50.0]
weight_data_array = np.array(weight_data, np.float32)

bias_data = [6000, 7000, 8000, 9000]
bias_data_array = np.array(bias_data, np.float32)

input_bindings = []
def append_input_tensor(builder: dml.GraphBuilder, input_bindings: list, input_tensor: dml.TensorDesc, tensor_data_array):
    tensor = dml.input_tensor(builder, len(input_bindings), input_tensor)
    input_bindings.append(dml.Binding(tensor, tensor_data_array))
    return tensor

device = dml.Device(True, True)
builder = dml.GraphBuilder(device)
data_type = dml.TensorDataType.FLOAT32
input = dml.input_tensor(builder, 0, dml.TensorDesc(data_type, [1, 4, 2, 2]))
flags = dml.TensorFlags.OWNED_BY_DML
input_bindings.append(dml.Binding(input, input_data_array))
convolution_weight = append_input_tensor(builder, input_bindings, dml.TensorDesc(
    data_type, flags, [4, 1, 2, 2]), weight_data_array)
convolution_bias = append_input_tensor(builder, input_bindings, dml.TensorDesc(
    data_type, flags, [1, 4, 1, 1]), bias_data_array)
convolution = dml.convolution(input, convolution_weight, convolution_bias, strides=[
                              1, 1], start_padding=[0, 0], end_padding=[0, 0], group_count=4)
op = builder.build(dml.ExecutionFlags.NONE, [convolution])
output_data = device.compute(op, input_bindings, [convolution])
output_tensor = np.array(output_data[0], np.float32)
print(output_tensor)

I validated with DirectML C/C++ API directly without using DirectMLX and PyDirectML, this conv2d issue will still occur. @huningxin

Do other PyDirectML samples work on the 1.7.0 and 1.8.0
Absolutely there are cases which can pass on the 1.7.0 and 1.8.0, I validated. And some other Depthwise Conv2d cases can pass. We should treat case by case, I have not found out the common point for these failed cases so far, so I need help from DirectML. @huningxin

@Jamather
Copy link
Contributor

Jamather commented Apr 25, 2022

Can you provide us some information about your hardware and drivers so we can reproduce the issue?
Also, does the issue occur if you use DML_EXECUTION_FLAG_DISABLE_META_COMMANDS?
Thanks!

@Jamather Jamather added the bug Something isn't working label Apr 25, 2022
@Jamather Jamather self-assigned this Apr 25, 2022
@mingmingtasd
Copy link
Contributor Author

Can you provide us some information about your hardware and drivers so we can reproduce the issue? Also, does the issue occur if you use DML_EXECUTION_FLAG_DISABLE_META_COMMANDS? Thanks!

I tried with op = builder.build(dml.ExecutionFlags.DISABLE_META_COMMANDS, [convolution]), the issue still exists. The info of HW and drivers:
Device name DESKTOP-KF70OVD
Processor Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 3.60 GHz
Installed RAM 32.0 GB (31.9 GB usable)
Device ID DACF1A09-470F-4B31-AA2D-3A8256124C7A
Product ID 00330-80000-00000-AA440
System type 64-bit operating system, x64-based processor
Pen and touch Pen and touch support with 10 touch points

Edition Windows 10 Pro
Version 21H1
Installed on ‎2/‎11/‎2022
OS build 19043.1645
Experience Windows Feature Experience Pack 120.2212.4170.0

NVIDIA GeForce GT 1030
driver version: 27.21.14.5671
@Jamather Thanks a lot!

@fdwr
Copy link
Contributor

fdwr commented May 6, 2022

@mingmingtasd Fix in https://www.nuget.org/packages/Microsoft.AI.DirectML/1.8.2. Thanks to @Jamather for diagnosing and fixing it.

@mingmingtasd
Copy link
Contributor Author

Thanks so much! I verified yet, close this issue. @Jamather @fdwr @huningxin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants