Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting schedules (except convolutions) to C++ #763

Merged
merged 78 commits into from
Jan 28, 2018
Merged

Porting schedules (except convolutions) to C++ #763

merged 78 commits into from
Jan 28, 2018

Conversation

alex-weaver
Copy link
Contributor

@alex-weaver alex-weaver commented Jan 8, 2018

Operators

  • broadcast_min, broadcast_max
  • elementwise: identity, negative, log, left_shift, right_shift, clip, cast
  • reduction: sum, max, min, argmax, argmin
  • transforms: expand_dims, transpose, reshape, squeeze, concatenate, split
  • batch_norm_inference
  • binary_dense
  • dense
  • dilate
  • leaky_relu
  • flatten
  • scale_shift_nchw, scale_shift_nhwc
  • pad - add pad_value argument
  • global_pool, pool
  • softmax, log_softmax
  • pow, broadcast_pow

Generic schedules

  • injective
  • _default_schedule
  • extern

CUDA schedules

  • injective
  • dense
  • extern
  • pool, global_pool
  • reduce
  • softmax

Other schedules

@joyalbin
Copy link
Contributor

joyalbin commented Jan 9, 2018

How about "power" operator in elementwise category?

@alex-weaver
Copy link
Contributor Author

power would definitely be useful. Would it make sense to define two?

  1. pow(Tensor, Expr)
  2. pow(Tensor, Tensor) - where this one follows broadcasting rules

@joyalbin
Copy link
Contributor

joyalbin commented Jan 9, 2018

yes, two versions of pow make sense.
I am working on implementation of basic version first, will post the progress here later.

Tensor CommReduceIdx(const Tensor& data,
const std::vector<int>& axis,
FCommReduce func,
bool keepdims = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any update wrt this comment?

* \return The output tensor.
*/
inline Tensor dilate(const Tensor& x,
std::vector<int> strides,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change most of the std::vector arguments to Array

Copy link
Contributor Author

@alex-weaver alex-weaver Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should an array of ints be represented? Should it be an Array<Expr> filled with constants? (Array<int> fails to compile)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array of note that strides can be passed in as symbolic expressions, although not frequently used

Copy link
Contributor Author

@alex-weaver alex-weaver Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry array of what? Github may have broken your comment. Are you meaning Array of Expr?

I see that strides could be expressions, but nn/dilate.h:69 relies on being able to evaluate strides[i]. This and surrounding lines are:

        if (strides[i] != 1) {
          index_tuple.push_back(indices[i] / strides[i]);
          not_zero.push_back((indices[i] % strides[i]) == 0);
        } else {
          index_tuple.push_back(indices[i]);
        }

Would a correct implementation of this with changing the signature to Array<Expr> strides be this?

        if (IsConstInt(strides[i]) && GetConstInt(strides[i]) == 1) {
          index_tuple.push_back(indices[i]);
        } else {
          index_tuple.push_back(indices[i] / strides[i]);
          not_zero.push_back((indices[i] % strides[i]) == 0);
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let us do that

using FExtern = std::function<Expr(Array<Buffer>, Array<Buffer>)>;

/*! \brief Create tensors representing the result of invoking an external function */
Array<Tensor> make_extern(const Array<Array<Expr>>& out_shapes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>> -> > > note the space, for backward compatibility for MSVC

}

/*! \brief Pack a buffer object to be used as an argument to a PackedFunc */
Expr pack_buffer(Buffer buf) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs to a detail namespace. Add more comments to argument and return value.

This functions is used to create a DLTensor structure on heap to be able to pass a symbolic buffer as arguments to TVM PackedFunc

namespace topi {
using namespace tvm;

/*! \brief Fuse all of the given args */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document all arguments

namespace topi {
using namespace tvm;

/*! \brief Get padding size for each side given padding height and width */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document all arguments, and what is expected in return

@tqchen
Copy link
Member

tqchen commented Jan 26, 2018

Thanks for all the effort to bring this set of changes! I like the structure overall and the code looks good as I skim through.

Since there are a lot of changes. Please take a pass over the code to make sure we have good readability of the code, put in comments delibrately in places that you think requires explaination so it is easier for others to understand and improve upon this.

@alex-weaver
Copy link
Contributor Author

Ok I've fixed up the comments on some of the more obscure bits of the code. Let me know if there's anything else that needs sorting. I'd like to get some kind of schedule registry working like there is in the python code but I think that's best left for a separate PR ;)

@tqchen tqchen changed the title [WIP] Porting schedules (except convolutions) to C++ Porting schedules (except convolutions) to C++ Jan 28, 2018
@tqchen tqchen merged commit fda2fa1 into apache:master Jan 28, 2018
@tqchen
Copy link
Member

tqchen commented Jan 28, 2018

Thanks! this is merged

@tqchen tqchen mentioned this pull request May 29, 2018
5 tasks
tqchen pushed a commit to tqchen/tvm that referenced this pull request Jul 6, 2018
* Ported injective schedules to C++. Added some elementwise ops.

* Fix lint errors

* Added reduction ops and schedules

* Fix lint errors

* Fix lint errors

* Fix lint errors

* Added transform ops

* Fix lint errors

* Fix lint errors

* Added softmax, log_softmax, leaky_relu and flatten ops.
Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag
instead of PureIntrinsic.
Added softmax CUDA schedule.

* Fix lint

* Fix lint

* Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*,
global_pool and pool ops.
Extended pad to allow specifying pad_value.
Fixed issue where pad would throw if padding was zero in all dimensions.

* Fix lint

* Fix lint

* Added CUDA schedules for dense, pool and global_pool

* Added extern schedules for generic and CUDA

* Fix lint

* Added x86 binary schedules

* Fix lint

* Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops

* Added pow ops. Added x86 default and injective schedules

* Fix lint

* Fix lint

* Fix lint

* Fix lint

* Fix lint

* Fix indent

* Removed schedules directory

* Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage

* Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>>

* Fixed comments for shift operators

* Added comments to utility functions

* Added TOPI C++ library, exporting broadcast_add op

* Fix lint

* Share libinfo.py with TVM

* Fix lint

* Add other broadcast ops

* Fix lint

* Fix imports in topi

* Fix lib names

* Fixed build issue where windows builds don't apply correct definitions

* Removed TVM_EXPORTS from topi library

* Attempted CI build fix

* Add topi lib to tvm_multilib

* Fix Jenkinsfile

* Added TOPI build target to Makefile

* Fix nn op namespaces.

* Fix lint

* Renamed TOPI lib to libtvm_topi

* Removed _ffi/base.py

* Remove _ffi from topi, now shared with tvm.

* Make libtvm_topi loading optional

* Fix compiler warnings

* Fix lint

* Fix lint

* Fix lint

* Fix build error by making new libs argument to Target optional

* Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops

* Fix lint

* Fix lint

* Fix compile error

* Fix compiler warnings

* Fix compiler warnings

* Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops.

* Refactored reduce builders

* Fixed typos in topi.cc. Added basic test.

* Fixed padding size error. Added dense, dilate, pooling tests

* Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests.

* Changed extension type numbers to avoid clash with NNVM

* Fix lint

* Fix compiler warnings

* Removed use of std::vector from the public TOPI API

* Fix lint

* Add TOPI C++ tests to CI

* Fixed detail namespacing. Improved comments.
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
* Ported injective schedules to C++. Added some elementwise ops.

* Fix lint errors

* Added reduction ops and schedules

* Fix lint errors

* Fix lint errors

* Fix lint errors

* Added transform ops

* Fix lint errors

* Fix lint errors

* Added softmax, log_softmax, leaky_relu and flatten ops.
Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag
instead of PureIntrinsic.
Added softmax CUDA schedule.

* Fix lint

* Fix lint

* Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*,
global_pool and pool ops.
Extended pad to allow specifying pad_value.
Fixed issue where pad would throw if padding was zero in all dimensions.

* Fix lint

* Fix lint

* Added CUDA schedules for dense, pool and global_pool

* Added extern schedules for generic and CUDA

* Fix lint

* Added x86 binary schedules

* Fix lint

* Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops

* Added pow ops. Added x86 default and injective schedules

* Fix lint

* Fix lint

* Fix lint

* Fix lint

* Fix lint

* Fix indent

* Removed schedules directory

* Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage

* Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>>

* Fixed comments for shift operators

* Added comments to utility functions

* Added TOPI C++ library, exporting broadcast_add op

* Fix lint

* Share libinfo.py with TVM

* Fix lint

* Add other broadcast ops

* Fix lint

* Fix imports in topi

* Fix lib names

* Fixed build issue where windows builds don't apply correct definitions

* Removed TVM_EXPORTS from topi library

* Attempted CI build fix

* Add topi lib to tvm_multilib

* Fix Jenkinsfile

* Added TOPI build target to Makefile

* Fix nn op namespaces.

* Fix lint

* Renamed TOPI lib to libtvm_topi

* Removed _ffi/base.py

* Remove _ffi from topi, now shared with tvm.

* Make libtvm_topi loading optional

* Fix compiler warnings

* Fix lint

* Fix lint

* Fix lint

* Fix build error by making new libs argument to Target optional

* Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops

* Fix lint

* Fix lint

* Fix compile error

* Fix compiler warnings

* Fix compiler warnings

* Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops.

* Refactored reduce builders

* Fixed typos in topi.cc. Added basic test.

* Fixed padding size error. Added dense, dilate, pooling tests

* Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests.

* Changed extension type numbers to avoid clash with NNVM

* Fix lint

* Fix compiler warnings

* Removed use of std::vector from the public TOPI API

* Fix lint

* Add TOPI C++ tests to CI

* Fixed detail namespacing. Improved comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants