Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures of test_quantized_classification_model[resnet50] #4683

Closed
NicolasHug opened this issue Oct 21, 2021 · 6 comments · Fixed by #4686
Closed

Failures of test_quantized_classification_model[resnet50] #4683

NicolasHug opened this issue Oct 21, 2021 · 6 comments · Fixed by #4686

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Oct 21, 2021

Looks like test_quantized_classification_model[resnet50] is failing in some PRs like https://app.circleci.com/pipelines/github/pytorch/vision/11583/workflows/84517aa3-fa6b-4527-8ee6-8a09ae76199f/jobs/900594

and it's also failing internally: https://www.internalfb.com/intern/tests/search?search_id=757311188293548

It looks like the failure is related to the new expected values checks introduced in #4597

I think an easy fix is just to add resnet50 to the quantized_flaky_models list, perhaps there's a better solution. It seems that the errors are consistently the same across executions, with the same atol and rtol differences:

Mismatched elements: 1 / 5 (20.0%)
Greatest absolute difference: 0.2876443862915039 at index (0, 1) (up to 0.1 allowed)
Greatest relative difference: 0.16666666666666666 at index (0, 1) (up to 0.1 allowed)

So there might be a source of variability that we're not controlling?

cc @datumbox @pmeier

@datumbox
Copy link
Contributor

datumbox commented Oct 21, 2021

@NicolasHug Thanks for the ping.

Prior to #4597 the quantized models were not tested for their expected values like the unquantized ones. So it's the first time we observe issues related to platform differences etc. Like you said, it's worth investigating more to identify the source of flakiness. The quantized models are expected to be more flaky due to the reduced precision and the fact we are using uninitialised weights. One more reason to add the pre-trained weights on the CI cache.

I propose to monitor the situation and if this becomes more problematic than the already flaky tests reported at #4506 we should look for a more immediate solution. Thoughts?

Edit:
I just saw that @prabhat00155 was looking into it. If that's the case then by all means, let's investigate.

@NicolasHug
Copy link
Member Author

I propose to monitor the situation and if this becomes more problematic than the already flaky tests reported at #4506 we should look for a more immediate solution. Thoughts?

None of the tests in #4506 have been reported as failing on the internal CI so far, and yet they've been around for longer, so this one seems a bit more serious I think.

@datumbox
Copy link
Contributor

@NicolasHug I wonder if this is because these tests are disabled already on FBcode?

@prabhat00155 could you clarify if you are looking into this already?

@NicolasHug
Copy link
Member Author

@NicolasHug I wonder if this is because these tests are disabled already on FBcode?

I don't think so: the disabled tests would still show up as "broken" in https://www.internalfb.com/intern/tests/search?search_id=757311188293548

Taking a few of the ones from #4506 randomly, they're all green:

@datumbox
Copy link
Contributor

Thanks for checking. I confirm that the tests fail because:

E           AssertionError: Tensor-likes are not close!
E           
E           Mismatched elements: 1 / 5 (20.0%)
E           Greatest absolute difference: 0.2876443862915039 at index (0, 1) (up to 0.1 allowed)
E           Greatest relative difference: 0.16666666666666666 at index (0, 1) (up to 0.1 allowed)

From the monitor it also seems that a couple of times the test passed, which is also weird because we fix the seed and the platform remains the same. So it's unclear if the additional randomness comes from hardware or other sources. At any case I agree that perhaps the easiest fix is to add the model in the quantized_flaky_models list and review our whole testing strategy soon to reduce flakiness.

@prabhat00155
Copy link
Contributor

This task T103498945 blamed D31649969(#4605) for the failures. However, when I ran the test under fbcode on my devvm I didn't see the failure. I haven't tried running the test on the external repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants