Failures of test_quantized_classification_model[resnet50] #4683

NicolasHug · 2021-10-21T10:01:41Z

Looks like test_quantized_classification_model[resnet50] is failing in some PRs like https://app.circleci.com/pipelines/github/pytorch/vision/11583/workflows/84517aa3-fa6b-4527-8ee6-8a09ae76199f/jobs/900594

and it's also failing internally: https://www.internalfb.com/intern/tests/search?search_id=757311188293548

It looks like the failure is related to the new expected values checks introduced in #4597

I think an easy fix is just to add resnet50 to the quantized_flaky_models list, perhaps there's a better solution. It seems that the errors are consistently the same across executions, with the same atol and rtol differences:

Mismatched elements: 1 / 5 (20.0%)
Greatest absolute difference: 0.2876443862915039 at index (0, 1) (up to 0.1 allowed)
Greatest relative difference: 0.16666666666666666 at index (0, 1) (up to 0.1 allowed)

So there might be a source of variability that we're not controlling?

cc @datumbox @pmeier

The text was updated successfully, but these errors were encountered:

datumbox · 2021-10-21T10:31:36Z

@NicolasHug Thanks for the ping.

Prior to #4597 the quantized models were not tested for their expected values like the unquantized ones. So it's the first time we observe issues related to platform differences etc. Like you said, it's worth investigating more to identify the source of flakiness. The quantized models are expected to be more flaky due to the reduced precision and the fact we are using uninitialised weights. One more reason to add the pre-trained weights on the CI cache.

I propose to monitor the situation and if this becomes more problematic than the already flaky tests reported at #4506 we should look for a more immediate solution. Thoughts?

Edit:
I just saw that @prabhat00155 was looking into it. If that's the case then by all means, let's investigate.

NicolasHug · 2021-10-21T10:35:54Z

I propose to monitor the situation and if this becomes more problematic than the already flaky tests reported at #4506 we should look for a more immediate solution. Thoughts?

None of the tests in #4506 have been reported as failing on the internal CI so far, and yet they've been around for longer, so this one seems a bit more serious I think.

datumbox · 2021-10-21T11:02:55Z

@NicolasHug I wonder if this is because these tests are disabled already on FBcode?

@prabhat00155 could you clarify if you are looking into this already?

NicolasHug · 2021-10-21T11:22:53Z

@NicolasHug I wonder if this is because these tests are disabled already on FBcode?

I don't think so: the disabled tests would still show up as "broken" in https://www.internalfb.com/intern/tests/search?search_id=757311188293548

Taking a few of the ones from #4506 randomly, they're all green:

datumbox · 2021-10-21T11:35:46Z

Thanks for checking. I confirm that the tests fail because:

E           AssertionError: Tensor-likes are not close!
E           
E           Mismatched elements: 1 / 5 (20.0%)
E           Greatest absolute difference: 0.2876443862915039 at index (0, 1) (up to 0.1 allowed)
E           Greatest relative difference: 0.16666666666666666 at index (0, 1) (up to 0.1 allowed)

From the monitor it also seems that a couple of times the test passed, which is also weird because we fix the seed and the platform remains the same. So it's unclear if the additional randomness comes from hardware or other sources. At any case I agree that perhaps the easiest fix is to add the model in the quantized_flaky_models list and review our whole testing strategy soon to reduce flakiness.

prabhat00155 · 2021-10-21T11:47:54Z

This task T103498945 blamed D31649969(#4605) for the failures. However, when I ran the test under fbcode on my devvm I didn't see the failure. I haven't tried running the test on the external repo.

NicolasHug added module: models module: tests labels Oct 21, 2021

NicolasHug mentioned this issue Oct 21, 2021

Add support for 16 bits png images #4657

Merged

datumbox mentioned this issue Oct 21, 2021

Skip flaky check for quantized resnet50 #4686

Merged

datumbox closed this as completed in #4686 Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failures of test_quantized_classification_model[resnet50] #4683

Failures of test_quantized_classification_model[resnet50] #4683

NicolasHug commented Oct 21, 2021 •

edited by pytorch-probot bot

Loading

datumbox commented Oct 21, 2021 •

edited

Loading

NicolasHug commented Oct 21, 2021

datumbox commented Oct 21, 2021

NicolasHug commented Oct 21, 2021

datumbox commented Oct 21, 2021

prabhat00155 commented Oct 21, 2021

Failures of test_quantized_classification_model[resnet50] #4683

Failures of test_quantized_classification_model[resnet50] #4683

Comments

NicolasHug commented Oct 21, 2021 • edited by pytorch-probot bot Loading

datumbox commented Oct 21, 2021 • edited Loading

NicolasHug commented Oct 21, 2021

datumbox commented Oct 21, 2021

NicolasHug commented Oct 21, 2021

datumbox commented Oct 21, 2021

prabhat00155 commented Oct 21, 2021

NicolasHug commented Oct 21, 2021 •

edited by pytorch-probot bot

Loading

datumbox commented Oct 21, 2021 •

edited

Loading