Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Fix get_valid_count flaky test for cuda #4901

Merged
merged 10 commits into from
Feb 21, 2020
Merged

Conversation

Laurawly
Copy link
Contributor

@Laurawly Laurawly commented Feb 17, 2020

Turned on get_valid_count test for cuda in topi. Used atomic operations in this fix to replace previous block sync method.
Tested on V100 and T4 GPUs.

@trevor-m @kevinthesun @yzhliu Could you review?

Copy link
Contributor

@trevor-m trevor-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

topi/python/topi/cuda/nms.py Outdated Show resolved Hide resolved
@Laurawly
Copy link
Contributor Author

ping @kevinthesun @yzhliu

@yzhliu yzhliu merged commit c4c61cb into apache:master Feb 21, 2020
@yzhliu
Copy link
Member

yzhliu commented Feb 21, 2020

Thanks @Laurawly @trevor-m

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
* get_valid_count accuracy issue fixed for individual tests but not for all tests running together

* minor fix

* initialize valid_count and PrefixSum buffers

* test updated

* udpate relay test as well

* update document

* fix lint

* address comment

* fix lint

* correct atomicAdd identifier name
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
* get_valid_count accuracy issue fixed for individual tests but not for all tests running together

* minor fix

* initialize valid_count and PrefixSum buffers

* test updated

* udpate relay test as well

* update document

* fix lint

* address comment

* fix lint

* correct atomicAdd identifier name
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
* get_valid_count accuracy issue fixed for individual tests but not for all tests running together

* minor fix

* initialize valid_count and PrefixSum buffers

* test updated

* udpate relay test as well

* update document

* fix lint

* address comment

* fix lint

* correct atomicAdd identifier name
@masahi
Copy link
Member

masahi commented Mar 5, 2020

@Laurawly I still get a flaky failure from get_valid_count test in my PR
https://ci.tvm.ai/blue/organizations/jenkins/tvm/detail/PR-4964/8/pipeline/246

@Laurawly
Copy link
Contributor Author

Laurawly commented Mar 5, 2020

@masahi That’s weird. You can comment off the test for now and I’ll try to reproduce it on my end.

@masahi
Copy link
Member

masahi commented Mar 7, 2020

@Laurawly Also reported at a different PR #4931 (comment)

@Laurawly
Copy link
Contributor Author

Laurawly commented Mar 7, 2020

@Laurawly Also reported at a different PR #4931 (comment)

Yeah, pls feel free to comment out the test: https://github.com/apache/incubator-tvm/blob/master/topi/tests/python/test_topi_vision.py#L106

@masahi
Copy link
Member

masahi commented Mar 7, 2020

fortunately my PR is green now without commenting out :) If the next open PR by somebody else sees the same problem, I'll ask him/her to comment it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants