-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Improvement Needed for Unit Tests #6633
Comments
I think |
yes. @razorback3. i will follow up and get back to you with a response. This is really not good. |
@razorback3 @jliangTT @jvasilje We have updated the datagen and comparison function in PR #6679 for debugging. |
@razorback3 @jliangTT @jvasilje @hschoi4448 We are also observing the following scenario (not sure whether this is an expected behavior),
To handle this we have to add the condition separately in the logic. We are checking this kind of issue also. Scenario 1: Scenario 2: Also few ops depends on this issue as well #6676 |
Status:
I think we can downgrade this to p1. |
@jliangTT |
please see this doc - https://docs.google.com/spreadsheets/d/1VV-EwGJn1EgBN3jX3tg4TcX_yDkm5HAO/edit#gid=66577367 (sorry i have to made a copy due to not being able to get around access control) |
will close this one for now. Can track the issue. Please re-open if you have any concerns. |
OK. @hschoi4448 will double-check the result when he comes back from his vacation. |
#6583 still has a problem.
# SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.
# SPDX-License-Identifier: Apache-2.0
import torch
import pytest
import tt_lib
from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results
def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1):
pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16()
tt_tensor = (
tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device)
)
return pt_tensor, tt_tensor
@pytest.mark.parametrize(
"input_shapes",
(
(torch.Size([1, 1, 32, 32])),
),
)
def test_bw_acosh(input_shapes, device):
in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=0.5)
grad_data, grad_tensor = data_gen_pt_tt(input_shapes, device, False, val=1)
print("input_tensor", input_tensor)
print("grad_tensor", grad_tensor)
pyt_y = torch.acosh(in_data)
tt_output_tensor_on_device = tt_lib.tensor.acosh_bw(grad_tensor, input_tensor)
in_data.retain_grad()
pyt_y.backward(gradient=grad_data)
golden_tensor = [in_data.grad]
comp_pass = compare_results(tt_output_tensor_on_device, golden_tensor)
print("tt_output_tensor_on_device", tt_output_tensor_on_device)
print("golden_tensor", golden_tensor)
assert comp_pass
|
Could you please check if the issues below are still problematic?
|
@hschoi4448 for all the above tagged issues has hardware and performance limitations where handling/storing Nan /inf is the problem
If you check each issue we have added observation and even for few ops we have raised the PR with fix PR is not merged due to pending approval from code owner Please find @rtawfik01 comment below
Also for tan op we can't support more range other than -1.45 to 1.45. For above this we have to do reduction operations with modulo operations which is not available. |
Understood. If it's a hardware issue with limitations on performance and functionality, it seems that it's not something I can decide on, so I'll pass it on to my team. @razorback3 |
Updates :
|
Updates :
|
@eyonland @umadevimcw Can you please advise when this item will be unblocked, in addition to a realistic remediation timeline? TY @prajaramanTT FYI |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I recently reviewed the backward ops and found several bugs.
I believe there are two main reasons why there were so many bugs in backward ops:
The
compare_result
function often returns a pass even when the correct value and the TT result differ significantly. This may result in many bugs going unnoticed.ex) [Bug Report] invalid softplus backward result #6598
The input data used in unit tests does not always reflect the characteristics of the ops
For instance, in the case of relu6, the gradient formula varies depending on whether the input falls within the range of 0 to 6. Therefore, to test all intervals effectively, the input data should include values around 0, 6, and nearby points, such as [-1, 0, 3, 6, 7].
However, currently, input data is generated using
torch.randn
, results in values mostly around [-1 to 1], neglecting testing in the vicinity of 6 and its surrounding intervals.ex)
I didn't run all unit tests during the review, and only checked the suspicious parts, so I believe there are actually more bugs.
Improving unit tests seems to be a high priority to address recurring issues and find hidden bugs.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: