-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky tests that are recently popping up #4506
Comments
As we previously discussed offline, I'll start looking into the tests that rely on p-value testing. I'll change these to just test the case p == 0 and p == 1 -- for all values in-between, we'll just rely on pytorch core tests for |
looking into test_randomperspective_fill rn |
investigating test_frozenbatchnorm2d_eps |
looking at test_batched_nms_implementations |
looking at the |
looking at test.test_ops.TestRoiPool |
Looking into |
Looking back at this ticket, I think we all did fine intern job. Good for us :P |
In the case of |
It would be interesting to figure out whether the 6 failures correspond to a specific edge-case, but I wouldn't spend too much time on it either. It could just be some ties in the sorting (which is not a stable sort)? |
It's the unstable sort. See #4766 (comment) I think it's worth understanding why the open-source contributor couldn't make the sort stable (he was facing seg fault if I remember correctly). Fixing the sort will fix lots of instability on the Detection models, so definitely worth while. |
Since #4497 was merged, we're observing a few tests that start randomly failing.
Before #4497, these tests were almost always using the same RNG state, which was set in a test that was run earlier in the test execution suite. Now that all tests are properly independent and that the RNG doesn't leak, these tests run with a new RNG at each execution, and if they're unstable they might fail.
(Note: this is a good thing; it's better to know that they fail now rather than when submiting an unrelated PR, which is what happened in #3032 (comment))
For each of these tests we should find out whether the flakyness is severe or not. A simple solution is to parametrize the test over 100 or 1000 random seeds and check the failure rate. If the failure rate is reasonable we can just set a seed with
toch.manual_seed()
. If not, we should try to fix the test and make it more robust.The list of tests so far is:
cc @pmeier
The text was updated successfully, but these errors were encountered: