Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][RFC] Improve testing infrastructure of Relay #2884

Closed
MarisaKirisame opened this issue Mar 24, 2019 · 6 comments
Closed

[Relay][RFC] Improve testing infrastructure of Relay #2884

MarisaKirisame opened this issue Mar 24, 2019 · 6 comments

Comments

@MarisaKirisame
Copy link
Contributor

As an implementor of multiple passes in Relay, I had noticed that we have far too little test case.

Most test_pass_*.py do not test for all program construct. For example, test_pass_dead_code_elimination do not test about adt, or test_pass_fuse_ops do not test about closure and reference. I had looked into each file, and almost all of them are 'incomplete' in this sense.

The short and obvious answer ('require people to write more test!') do not work, simply because relay will likely keep adding more and more language construct (for example, Mutual Recursion as of #2468). So, even once complete test case will also be incomplete.

I propose we create a file common.py in test, and put a list of test_case (called test_list) in there.
Each compiler pass can now loop through the test_list, and see if the transformed expression is semantically equal to the original one.

Additionally, I propose we also implement a function, semantic_equal, which take two relay expression, and do some property testing to see if they are semantically the same.

The reason is that a large portion of the test code is simply: creating an executor, creating the right amount of input, in the right type, get their output, deconstruct it to be multiple tensor, and testing that each of them is correct.

It is difficult to write a generic helper function to do all of them, because they have different input/output type. The check_eval function that sort of does the above only handle the case for a single output tensor, and the user still has to construct the input by hand.

If we implement semantic_equal, the test_list can have test of different type.
Additionally, it also mean we do test on more input, instead of the current handwritten one.

@MarisaKirisame MarisaKirisame changed the title [Relay][RFC] Improving testing infrastructure of Relay [Relay][RFC] Improve testing infrastructure of Relay Mar 24, 2019
@slyubomirsky
Copy link
Contributor

The test code has a lot of repetition and I am strongly in favor of refactoring existing tests (some of which show a fair bit of bitrot) to use common libraries. Testing error cases would also be a good thing to have shared infrastructure for.

@zhiics
Copy link
Member

zhiics commented Mar 24, 2019

+1 for refactoring.

BTW, we probably also need to have some discussion about adding some regression tests in CI pipeline because some passes could noticeably affect perf. But this can be a separate issue.

@tqchen
Copy link
Member

tqchen commented Mar 24, 2019

Realistically, infrastructure is only one way to help making better regression tests but cannot ever replace regression tests.

We do need and should write more testcases in the meanwhile :)

@srkreddy1238
Copy link
Contributor

Can we add weekly builds for end to end test cases which takes long hours ?

@Ravenwater
Copy link

+1 @zhiics Performance regression testing for compiler algorithms is essential. Too many places where a bad local decision can take the whole system down, and trying to reconstruct these cases two months after they were introduced is a waste of good engineering resources, better allocated to enhancing features.

@tqchen
Copy link
Member

tqchen commented Jul 24, 2019

Duplicated with #3154

@tqchen tqchen closed this as completed Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants