-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] Fast tanh #3255
[TOPI] Fast tanh #3255
Conversation
@ajtulloch, could you review pls? |
LGTM, would be good to get a review from @ajtulloch and then we can merge. Does this have any approximation or numerical stability issues? |
@jroesch Asked me to take a look accuracy-wise—it's not a review, just a quick take. The Eigen tanh is implemented using rational approximation on [-9, 9] and is set to ±1 outside that range. (See comment in the source though note that the implementation is actually done by clamping.) In GLIBC, which I assume is the currently-used implementation, Let's start with single precision. Generally speaking, I expect the Eigen implementation to be faster (evaluating two polynomials, plus one division, is going to be much faster than a logarithm!) and I assume the polynomials are well-chosen so that the accuracy is going to be acceptable (the comment says that it's within a few ULPs... that they don't say how many doesn't inspire confidence, but they're using a 13/6 approximation which seems good enough). Plus, I assume you're using this implementation for an activation function, which which exact accuracy is likely unimportant. And the rational approximation is going to be monotonic, which is nice. Now let's do double precision. Here, the Eigen implementation will only be as accurate as a single-precision computation, because it's missing terms in the polynomial. And, while clamping at ±9 is appropriate in single precision, in double precision you have to clamp at 19 (and so need a rational approximation accurate that far out). I don't know what exactly your users think about accuracy, but I suspect they wouldn't be happy with double-precision being no more accurate than single precision. The safe but practical thing, I think, is using the Eigen |
4bd414a
to
83eb574
Compare
@pavpanchekha, thanks for the comment. It makes a lot of sense. |
ce650de
to
fe05f22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, excellent idea - just a suggestion for a ulp-bound test.
5869797
to
de7a162
Compare
Looks like tests fail because of CUDA expf being > 1 ULP (IIRC it's something like 5 ULP max), but maybe we should just enable ULP checking for the tanh impl? |
topi/tests/python/test_topi_math.py
Outdated
high, | ||
shape=(20, 3), | ||
dtype=tvm.float32, | ||
maxulp=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just make this an optional setting that only tanh uses for now?
I did a bit more testing and noticed that the error between import numpy as np
import tvm
import topi
import topi.testing
from topi import util
m = tvm.var('m')
l = tvm.var('l')
A = tvm.placeholder((m, l), name='A')
shape = (20, 3)
B = topi.tanh(A)
for _ in range(10):
a_np = np.random.uniform(low=-1, high=1, size=shape).astype(A.dtype)
b_np = np.tanh(a_np)
device = "llvm"
ctx = tvm.context(device, 0)
with tvm.target.create(device):
s = topi.generic.schedule_injective(B)
foo = tvm.build(s, [A, B], device, name="tanh")
a = tvm.nd.array(a_np, ctx)
b = tvm.nd.array(np.zeros_like(b_np), ctx)
foo(a, b)
try:
np.testing.assert_array_almost_equal_nulp(b.asnumpy(), b_np)
except AssertionError as error:
print(error) Original:
Eigen:
|
Sounds good, looks great then. Thanks for digging into it. |
Thanks, @pavpanchekha @jroesch @hlu1 @ajtulloch @antinucleon , this PR is now merged |
Thanks @tqchen |
Borrowing the fast_tanh_float implementation from Eigen (https://github.com/eigenteam/eigen-git-mirror/blob/80f488a7bc9b7c64c9d0c0e8fb301fd905ad1b95/Eigen/src/Core/MathFunctionsImpl.h#L26) can bring about 28x speedup to tanh.
benchmark:
Results:
The speedup is about the same on intel skylakes.