-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TE] reverse-mode autodiff without any optimization #5121
Conversation
7379717
to
4e94a31
Compare
// This case is relatively difficult because a reduction expression | ||
// may use an arbitrary combiner. | ||
// The resulting reduction expression will return a tuple containing | ||
// both derivatives and the original results (in exactly this order). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you switch the order? most ad code use original result first, derivatives later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into a bit more, the order actually makes difference. When original init value is different from its derivative init value, and they depends on each other during calculation, we must calculate derivative first (using origin's init value), switch the order in tvm makes the origin value be replaced before using, produces incorrect results.
One example is in the test case,
def fcombine(x, y):
return x*y
def fidentity(t0):
return tvm.tir.const(1, t0)
prod = te.comm_reducer(fcombine, fidentity, name='prod')
B = te.compute((10, 10), lambda i, j: prod(A0[i, k] + A0[k, i], axis=k), name='B')
check_grad(B, A0)
Correct result (derivative first):
produce B.jacobian {
for (i, 0, 10) {
for (j, 0, 10) {
for (jac_i0, 0, 10) {
for (jac_i1, 0, 10) {
B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 0f
B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 1f
for (k, 0, 10) {
B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = ((B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)])) + ((float32(((jac_i0 == i) && (jac_i1 == k))) + float32(((jac_i0 == k) && (jac_i1 == i))))*B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]))
B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = (B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)]))
}
}
}
}
}
}
Output B.jacobian.v0
Incorrect result (origin first):
produce B.jacobian {
for (i, 0, 10) {
for (j, 0, 10) {
for (jac_i0, 0, 10) {
for (jac_i1, 0, 10) {
B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 1f
B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 0f
for (k, 0, 10) {
B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = (B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)]))
B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = ((B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)])) + ((float32(((jac_i0 == i) && (jac_i1 == k))) + float32(((jac_i0 == k) && (jac_i1 == i))))*B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]))
}
}
}
}
}
}
Output B.jacobian.v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks more like a bug in lowering of tupled reductions rather than an intended behavior, might deserve a separate bug report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen can you take a quick look and see if it is a bug in tuple reductions?
PrimExpr VisitExpr_(const OrNode* op) NOT_IMPLEMENTED | ||
|
||
PrimExpr VisitExpr_(const ReduceNode* op) { | ||
// This case is relatively difficult because a reduction expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have reduce inside other expression as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a bit difficult. do you have an concrete example in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no concrete example. If you dont think it will happend then leave it as is.
head : Tensor | ||
The adjoint of the output, in other words, some tensor, by which the Jacobians | ||
will be multiplied. Its shape must be of the form `prefix + output.shape`. | ||
If `None` is passed, the identity tensor of shape `output.shape + output.shape` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the default behavior is to return a Jacobian instead of adjoint, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right. more precisely, that's because the arguments are one output and multiple inputs, instead of one input and multiple outputs. if y is the only output, dy/dx is jacobian, it's also adjoint(x) for the previous layer. it depends on what aspect you want to emphasize, you use different terms.
Thanks for reviving this. The PR looks good to me, but I'm obviously partial. |
24b9d96
to
c527ea8
Compare
@MarisaKirisame @tqchen @hzfan Could you review again? |
@yzhliu can you do forward mode automatic differentiation? It is easy considering you have jacobian - you only need to do JacobianVectorProduct instead of VectorJacbianProduct It is useful in higher order derivative a la hessian vector product. |
(of course, not in this PR) |
Co-authored-by: Sergei Grechanik <[email protected]>
@MarisaKirisame sure, I will try. |
Kindly ping @tqchen , can we merge if it looks good? |
* [TE] reverse-mode autodiff without any optimization Co-authored-by: Sergei Grechanik <[email protected]> * address review comments * add comments and retrigger CI * move unittest to debug ci * move test back and add seed Co-authored-by: Sergei Grechanik <[email protected]>
* [TE] reverse-mode autodiff without any optimization Co-authored-by: Sergei Grechanik <[email protected]> * address review comments * add comments and retrigger CI * move unittest to debug ci * move test back and add seed Co-authored-by: Sergei Grechanik <[email protected]>
This is the first PR to bring in previously-implemented tensor-level autodiff.
This PR does not include any optimization, thus produces bad performance. Will submit optimization pass in another two or three PRs, so that not to put too much pressure on reviewers.
Also credit to @sgrechanik-h as I mentioned in the header of each file.
RFC: https://discuss.tvm.ai/t/rfc-bring-in-tensor-expression-autodiff
Please help to review @sgrechanik-h @MarisaKirisame @junrushao1994 @tqchen @hzfan