-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][RFC] Improve testing infrastructure of Relay #2884
Comments
The test code has a lot of repetition and I am strongly in favor of refactoring existing tests (some of which show a fair bit of bitrot) to use common libraries. Testing error cases would also be a good thing to have shared infrastructure for. |
+1 for refactoring. BTW, we probably also need to have some discussion about adding some regression tests in CI pipeline because some passes could noticeably affect perf. But this can be a separate issue. |
Realistically, infrastructure is only one way to help making better regression tests but cannot ever replace regression tests. We do need and should write more testcases in the meanwhile :) |
Can we add weekly builds for end to end test cases which takes long hours ? |
+1 @zhiics Performance regression testing for compiler algorithms is essential. Too many places where a bad local decision can take the whole system down, and trying to reconstruct these cases two months after they were introduced is a waste of good engineering resources, better allocated to enhancing features. |
Duplicated with #3154 |
As an implementor of multiple passes in Relay, I had noticed that we have far too little test case.
Most test_pass_*.py do not test for all program construct. For example, test_pass_dead_code_elimination do not test about adt, or test_pass_fuse_ops do not test about closure and reference. I had looked into each file, and almost all of them are 'incomplete' in this sense.
The short and obvious answer ('require people to write more test!') do not work, simply because relay will likely keep adding more and more language construct (for example, Mutual Recursion as of #2468). So, even once complete test case will also be incomplete.
I propose we create a file common.py in test, and put a list of test_case (called test_list) in there.
Each compiler pass can now loop through the test_list, and see if the transformed expression is semantically equal to the original one.
Additionally, I propose we also implement a function, semantic_equal, which take two relay expression, and do some property testing to see if they are semantically the same.
The reason is that a large portion of the test code is simply: creating an executor, creating the right amount of input, in the right type, get their output, deconstruct it to be multiple tensor, and testing that each of them is correct.
It is difficult to write a generic helper function to do all of them, because they have different input/output type. The check_eval function that sort of does the above only handle the case for a single output tensor, and the user still has to construct the input by hand.
If we implement semantic_equal, the test_list can have test of different type.
Additionally, it also mean we do test on more input, instead of the current handwritten one.
The text was updated successfully, but these errors were encountered: