Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[codegen] Add multiple operands and function support when using fp16 compilation #4056

Merged
merged 7 commits into from
Oct 11, 2019

Conversation

zxy844288792
Copy link
Contributor

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

As discussed in https://discuss.tvm.ai/t/error-cuda-compilation-error/3816 and https://discuss.tvm.ai/t/relay-automatic-fp16-downcasting/3952/3?u=xyzhou
The cuda fp16 computing uses “cuda_fp16.h”, It does not support operations with volatile keywords. Also, max function I refered to this PR, but it is not updating for a month, I add the min function support as well.

I also edit the test_op_level1.py to enable fp16 test case. I will edit more test_op file but I would like to gather some feedback.

@vinx13
Copy link
Member

vinx13 commented Oct 6, 2019

The error might be caused by incorrect arch setting. You can use
https://github.com/dmlc/tvm/blob/cffb4fba03ea582417e2630bd163bca773756af6/python/tvm/contrib/nvcc.py#L218-L238
to conditionally skip the test on ci.

Instead of adding these functions, directly overriding CUDA codegen rules for half might be preferred since we also want to deal with half2 and avoid repetition

@zxy844288792
Copy link
Contributor Author

The error might be caused by incorrect arch setting. You can use
https://github.com/dmlc/tvm/blob/cffb4fba03ea582417e2630bd163bca773756af6/python/tvm/contrib/nvcc.py#L218-L238

to conditionally skip the test on ci.
Instead of adding these functions, directly overriding CUDA codegen rules for half might be preferred since we also want to deal with half2 and avoid repetition

Thanks for your information! I will try to use have_fp16 to skip the test on CI. I will also start to investiagte how to override the codegen rules

@blacklong28
Copy link

I am similar to what you added, but there were some errors when I load resnet-18-fp16.onnx model on RTX2080.
cuda-got-error-cuda-error-launch-out-of-resources

@tqchen
Copy link
Member

tqchen commented Oct 10, 2019

@zxy844288792 @vinx13 please followup on this

@zxy844288792
Copy link
Contributor Author

@vinx13 Can we get this PR merged with the current changes? We can take look at half and half2 in a separate PR after we have more clarity.

gamma = relay.var("gamma", relay.TensorType((2,), dtype))
moving_mean = relay.var("moving_mean", relay.TensorType((2,), dtype))
moving_var = relay.var("moving_var", relay.TensorType((2,), dtype))
y = relay.nn.batch_norm(data, gamma, beta, moving_mean, moving_var,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fp16 for batch norm is not supported yet, need to merge #4088 first.

@vinx13
Copy link
Member

vinx13 commented Oct 11, 2019

@tqchen fp16 tests on ci are skipped now, any chances to get ci support for fp16 type?

@tqchen
Copy link
Member

tqchen commented Oct 11, 2019

We will need to look into it, because most of the gpu workers we have do not yet have fp16 support. So we have to rely on manual check for now. I will see if we can get a fp16 enabled worker setup

@vinx13 vinx13 changed the title [codegen] WIP - Add multiple operands and function support when using fp16 compilation [codegen] Add multiple operands and function support when using fp16 compilation Oct 11, 2019
@vinx13 vinx13 merged commit ce72e9b into apache:master Oct 11, 2019
@vinx13
Copy link
Member

vinx13 commented Oct 11, 2019

Thanks @zxy844288792 this is now merged

anijain2305 pushed a commit to anijain2305/tvm that referenced this pull request Oct 17, 2019
…compilation (apache#4056)

* overload half operators for cuda codegen

* add float16 te test_op_level1

* fix test_op_level1.py

* fix lint

* disable fp16 test if gpu does not support

* disable fp16 test if gpu does not support

* bypass float16 test if gpu does not support float16
wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 18, 2019
…compilation (apache#4056)

* overload half operators for cuda codegen

* add float16 te test_op_level1

* fix test_op_level1.py

* fix lint

* disable fp16 test if gpu does not support

* disable fp16 test if gpu does not support

* bypass float16 test if gpu does not support float16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants