-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Add large tensor nightly tests for MKL-DNN operators #16184
Conversation
Can't we reuse the already existing large tensor tests? From a front-end perspective, it shouldn't matter to the user which backend is being used, right? I understand that in the backend MKLDNN only supports float32 as input for now, but how about we hide that fact and instead do some magic in the backend in the meantime? IMO, the user-code should not be backend specific and thus we should use the same tests to enforce that constraint. I'd like to stay away from backend specific tests as much as possible, so I'd appreciate it if we could rather work towards improving these intermediary layers instead of adding a second set of tests. |
@marcoabreu Good point! I think these codes are backend independent :) |
Sorry for the delay reply. Maybe I can put such tests for |
f3594c6
to
f3795e8
Compare
I'll leave the details to you :) as long as it's backend independent and reused across the different ones, I'm fine with it. Feel free to hit me up once you made the changes to verify them. |
f3795e8
to
fd4dab4
Compare
@marcoabreu Please take a review again and see if such changes resolve your concerns. Thanks. |
@apeforest @ChaiBapchya Could you please take a look at this PR? Thanks. |
tests/nightly/test_large_array.py
Outdated
nd.waitall() | ||
return mean, stdvar | ||
|
||
shape = (MEDIUM_X, MEDIUM_X, SMALL_Y, SMALL_Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing the test by getting rid of LARGE_X, is that on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Forgot to change this. At least we need to make sure shape[2]*shape[3]
is beyond the index range of int32.
f0c3497
to
620bb37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wuxun-zhang Thanks for this PR. Apologies for delayed review. Can you rebase with latest master? Left a few questions too.
Thanks once again!
res = nd.FullyConnected(a, b, num_hidden=b.shape[0], no_bias=True) | ||
assert np.sum(res[-1].asnumpy() == a.shape[1]) == b.shape[0] | ||
|
||
res = nd.FullyConnected(a, b, c, num_hidden=b.shape[0], no_bias=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the intuition behind adding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to cover w/ bias
and w/o bias
tests/nightly/test_large_array.py
Outdated
nd.waitall() | ||
return mean, stdvar | ||
|
||
shape = (3, 3, LARGE_X, SMALL_Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to change shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to cover MKL-DNN batch-norm op, which only supports ndim=4
now.
@@ -952,11 +1001,11 @@ def test_reshape_like(): | |||
|
|||
|
|||
def test_flatten(): | |||
a = create_2d_tensor(rows=LARGE_X, columns=SMALL_Y).reshape((LARGE_X//2, 2, SMALL_Y)) | |||
b = nd.flatten(a) | |||
assert b[-1][-1] == (LARGE_X-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing these asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to the different precision of int64
and float32
. This function create_2d_tensor
will generate a NDArray with the elements varies from 0 to LARGE_X-1
. For float32
, it will lose some precision when LARGE_X
is too large, that is LARGE_X - 1
or LARGE_X - 2
can not represent the accurate value. This is different with int64
. So I think these asserts can be removed.
620bb37
to
1a36cc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Would have been great had those comments were inlined in code (for clarity) But thanks for addressing nonetheless.
1a36cc8
to
c1e1e96
Compare
@ChaiBapchya @marcoabreu Please take a look again and see if your concerns are properly resolved. Thanks. |
tests/nightly/test_large_array.py
Outdated
args_grad=None) | ||
ex.forward(is_train=False) | ||
softmax_out = ex.outputs[0][0].asnumpy() | ||
expected_softmax_out = (1/SMALL_Y)*mx.nd.ones((SMALL_Y)).asnumpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add space between operators and operands
tests/nightly/test_large_array.py
Outdated
@@ -782,8 +801,30 @@ def test_activation(): | |||
# in future, we could test if mean, var of output | |||
# matches target output's mean, var | |||
def test_batchnorm(): | |||
shape = (LARGE_X, SMALL_Y) | |||
def get_ref_mean_var(data, running_mean, running_var, eps, use_global_status=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use ndarray to get_ref_mean_var? Wouldn't it be simpler and more correct to just implement it using numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for numpy instead of nd and function could be get_np_mean_var rather than ref (i'm guessing ref mean't reference but not quite clear)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Here we removed the value asserts due to different precision of `int64` and `float32`. | ||
# For `float32`, it will lose some precision when `LARGE_X` is too large, that is `LARGE_X-1` | ||
# and `LARGE_X-2` can not represent the accurate value in the current situation. | ||
assert b.shape == (LARGE_X//2, SMALL_Y*2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test one of the values inside tensor b?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Since the currently nightly test is broken, could you please run all the tests offline and paste your output to this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run all the tests with mkldnn on and paste your output in this PR.
Output log of all mkl-dnn operators (with
|
19107de
to
42d2ba9
Compare
@apeforest @marcoabreu please take a review for this PR again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Sorry for the delay.
To track the correctness of MKL-DNN operators when switching to use int64 tensor size, we added more nightly tests into the original script. During these changes, we tried to test more scenarios including different data types (float32/int64) and so on.
@pengzhao-intel @TaoLv @apeforest @ChaiBapchya
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments