-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast exponent #4790
Fast exponent #4790
Conversation
12358c4
to
dbcaf2e
Compare
1831312
to
48d60c8
Compare
a286e9e
to
5e7efd8
Compare
topi/include/topi/elemwise.h
Outdated
std::string name = "T_exp", | ||
std::string tag = kElementWise) { | ||
if (x->dtype == DataType::Float(32)) { | ||
return fast_exp(x, name, tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless this fast_exp
is guaranteed to give a bit identical output as libc exp, I don't think it is a good idea to use this by default. I recommend using something like env var to enable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi It's not identical.
Relative fast exp error vs Tensorflow exp is between [-4.52e-06, 4.17e-06]
Relative fast exp error vs Numpy exp is [-3.11e-06, 3.10e-06]
How about using it only if enabled via cmake option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better way would be have a separate operator fast_exp
, then have a pass(fast-math) in relay to rewrite the exp into the fast_exp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @tqchen's solution. If you use cmake option it is not configurable after libtvm.so is built. It requires more work, but it can be done in later PR. This PR can be merged with topi only change including test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what I am talking about here because I also did fast_exp for my internal work in the past. Accurate exp is very slow and the high accuracy is not required for inference. The biggest benefit is it enables vectorization if it is written in topi (in my case it was HalideIR). Vectorizing exp was the main reason to introduce op fusion improvement in #1548
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having 3 new relay contrib operators - contrib.fast_exp, contrib.fast_tanh, contrib.fast_softmax. We can then add a Relay pass with opt_level 4, that legalizes these functions to their approximate counterparts.
Edit - Sorry should have told why these 3. For softmax, we are essentially playing with exp op. Softmax takes substantial time in SSD models, where input shape is very large. For tanh, we already have a fast_tanh that is enabled by default. We should change that.
@alexgl-github tests cases are absolutely required for a new operator like this. |
5e7efd8
to
dd1bc2d
Compare
4fe5940
to
bec4a55
Compare
@masahi @anijain2305 @FrozenGene Would you mind reviewing again? |
I have some silly questions: when should we switch to the fast_exp since it is in topi? Do we expect users to select it? Does this mean that this op is only available in topi, but not Relay? |
@zhiics In a separate PR we'll introduce relay optimization pass that should select fast_exp if opt_level=4 |
03cb498
to
04dabe1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can this get in? I will work on Relay changes. |
@tqchen @FrozenGene Can you please check if the changes you requested are addressed? |
Overall looks OK, it would be great if we can decide a consistent naming convention. In this case, we can have |
Right. I think |
04dabe1
to
a6eb710
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor comment. Please fix it.
a6eb710
to
df783ec
Compare
@tqchen please give an approval. |
Lets get this in - @tqchen |
ping @tqchen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from my side
Thanks @alexgl-github @anijain2305 @masahi @FrozenGene ! |
Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.