Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TE] reverse-mode autodiff without any optimization #5121

Merged
merged 5 commits into from
Mar 31, 2020

Conversation

yzhliu
Copy link
Member

@yzhliu yzhliu commented Mar 22, 2020

This is the first PR to bring in previously-implemented tensor-level autodiff.

This PR does not include any optimization, thus produces bad performance. Will submit optimization pass in another two or three PRs, so that not to put too much pressure on reviewers.

Also credit to @sgrechanik-h as I mentioned in the header of each file.

RFC: https://discuss.tvm.ai/t/rfc-bring-in-tensor-expression-autodiff

Please help to review @sgrechanik-h @MarisaKirisame @junrushao1994 @tqchen @hzfan

@yzhliu yzhliu force-pushed the te_autodiff branch 3 times, most recently from 7379717 to 4e94a31 Compare March 22, 2020 06:43
python/tvm/testing.py Outdated Show resolved Hide resolved
include/tvm/te/autodiff.h Outdated Show resolved Hide resolved
// This case is relatively difficult because a reduction expression
// may use an arbitrary combiner.
// The resulting reduction expression will return a tuple containing
// both derivatives and the original results (in exactly this order).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you switch the order? most ad code use original result first, derivatives later.

Copy link
Member Author

@yzhliu yzhliu Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into a bit more, the order actually makes difference. When original init value is different from its derivative init value, and they depends on each other during calculation, we must calculate derivative first (using origin's init value), switch the order in tvm makes the origin value be replaced before using, produces incorrect results.

One example is in the test case,

 def fcombine(x, y):
        return x*y

    def fidentity(t0):
        return tvm.tir.const(1, t0)

    prod = te.comm_reducer(fcombine, fidentity, name='prod')
    B = te.compute((10, 10), lambda i, j: prod(A0[i, k] + A0[k, i], axis=k), name='B')
    check_grad(B, A0)

Correct result (derivative first):

produce B.jacobian {
  for (i, 0, 10) {
    for (j, 0, 10) {
      for (jac_i0, 0, 10) {
        for (jac_i1, 0, 10) {
          B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 0f
          B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 1f
          for (k, 0, 10) {
            B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = ((B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)])) + ((float32(((jac_i0 == i) && (jac_i1 == k))) + float32(((jac_i0 == k) && (jac_i1 == i))))*B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]))
            B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = (B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)]))
          }
        }
      }
    }
  }
}
Output B.jacobian.v0

Incorrect result (origin first):

produce B.jacobian {
  for (i, 0, 10) {
    for (j, 0, 10) {
      for (jac_i0, 0, 10) {
        for (jac_i1, 0, 10) {
          B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 1f
          B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = 0f
          for (k, 0, 10) {
            B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = (B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)]))
            B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)] = ((B.jacobian.v1[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]*(A0[((i*10) + k)] + A0[((k*10) + i)])) + ((float32(((jac_i0 == i) && (jac_i1 == k))) + float32(((jac_i0 == k) && (jac_i1 == i))))*B.jacobian.v0[((((i*1000) + (j*100)) + (jac_i0*10)) + jac_i1)]))
          }
        }
      }
    }
  }
}
Output B.jacobian.v1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks more like a bug in lowering of tupled reductions rather than an intended behavior, might deserve a separate bug report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen can you take a quick look and see if it is a bug in tuple reductions?

PrimExpr VisitExpr_(const OrNode* op) NOT_IMPLEMENTED

PrimExpr VisitExpr_(const ReduceNode* op) {
// This case is relatively difficult because a reduction expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have reduce inside other expression as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a bit difficult. do you have an concrete example in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no concrete example. If you dont think it will happend then leave it as is.

include/tvm/te/autodiff.h Outdated Show resolved Hide resolved
src/te/autodiff/jacobian.cc Show resolved Hide resolved
head : Tensor
The adjoint of the output, in other words, some tensor, by which the Jacobians
will be multiplied. Its shape must be of the form `prefix + output.shape`.
If `None` is passed, the identity tensor of shape `output.shape + output.shape`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default behavior is to return a Jacobian instead of adjoint, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. more precisely, that's because the arguments are one output and multiple inputs, instead of one input and multiple outputs. if y is the only output, dy/dx is jacobian, it's also adjoint(x) for the previous layer. it depends on what aspect you want to emphasize, you use different terms.

@sgrechanik-h
Copy link
Contributor

Thanks for reviving this. The PR looks good to me, but I'm obviously partial.

@yzhliu yzhliu force-pushed the te_autodiff branch 2 times, most recently from 24b9d96 to c527ea8 Compare March 24, 2020 18:41
@yzhliu
Copy link
Member Author

yzhliu commented Mar 24, 2020

@MarisaKirisame @tqchen @hzfan Could you review again?

@MarisaKirisame
Copy link
Contributor

@yzhliu can you do forward mode automatic differentiation? It is easy considering you have jacobian - you only need to do JacobianVectorProduct instead of VectorJacbianProduct

It is useful in higher order derivative a la hessian vector product.

@MarisaKirisame
Copy link
Contributor

(of course, not in this PR)

@yzhliu
Copy link
Member Author

yzhliu commented Mar 25, 2020

@MarisaKirisame sure, I will try.

@yzhliu
Copy link
Member Author

yzhliu commented Mar 25, 2020

CI's green. @tqchen @hzfan check if there's anything needs to be addressed.

@yzhliu
Copy link
Member Author

yzhliu commented Mar 30, 2020

Kindly ping @tqchen , can we merge if it looks good?

@tqchen tqchen merged commit e4a5441 into apache:master Mar 31, 2020
@tqchen
Copy link
Member

tqchen commented Mar 31, 2020

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* [TE] reverse-mode autodiff without any optimization

Co-authored-by: Sergei Grechanik <[email protected]>

* address review comments

* add comments and retrigger CI

* move unittest to debug ci

* move test back and add seed

Co-authored-by: Sergei Grechanik <[email protected]>
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
* [TE] reverse-mode autodiff without any optimization

Co-authored-by: Sergei Grechanik <[email protected]>

* address review comments

* add comments and retrigger CI

* move unittest to debug ci

* move test back and add seed

Co-authored-by: Sergei Grechanik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants