-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Please move cudnn_* files into nn/cudnn/. |
src/operator/nn/convolution.cc
Outdated
} | ||
} | ||
|
||
static bool ConvolutionType(const nnvm::NodeAttrs& attrs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static seems unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is just personal preference. I usually try to expose as few symbols as possible to avoid symbol collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree as general rule, use static if it's only used in that file.
src/operator/activation-inl.h
Outdated
@@ -1,198 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of hard to review the changes since github diff doesn't recognize the files have been renamed when there're too many. Will a smaller PR generate a more readable diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break up this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to @reminisce git mv
will make git aware the file you moved and gives you a much better/readable diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I will probably need to redo the refactor and use git mv to move files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried git mv
and rebased the entire history, but git still shows files are removed and added back.
src/operator/nn/fully_connected.cc
Outdated
namespace op { | ||
|
||
static bool FullyConnectedShape(const nnvm::NodeAttrs& attrs, | ||
std::vector<TShape> *in_shape, std::vector<TShape> *out_shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix indentation:
const nnvm::NodeAttrs& attrs,
and std::vector<TShape> *in_shape
should be aligned.
// require data to be known | ||
if (dshape.ndim() == 0) return false; | ||
|
||
index_t num_input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're replacing index_t
with nnvm::dim_t
#7319
src/operator/nn/fully_connected.cc
Outdated
} | ||
|
||
static bool FullyConnectedType(const nnvm::NodeAttrs& attrs, | ||
std::vector<int> *in_type, std::vector<int> *out_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for alignment
@@ -883,7 +883,6 @@ def test_nearest_upsampling(): | |||
shapes = [(1,3,base*root_scale*scale**(num_shape-1-i),base*root_scale*scale**(num_shape-1-i)) for i in range(num_shape)] | |||
check_nearest_upsampling_with_shape(shapes, scale, root_scale) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check if all the refactored operators have their own unit test? I know some operators (e.g. FullyConnected
) don't have unit test. We should add them to make sure the refactoring doesn't break any operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullyConnected may not have its own unit test, but it's used in others' unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying that we'd better do a full scan to make sure all the refactored operators are tested in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a C++ unit test fully_conn_perf.cc in tests/cpp although it's just performance-related for now, but value tests could be added if desired.
@rahul003 @anirudh2290 Could you also help review this PR? Only a subset of operators are fine. It involves a lot of operators and we need to divide the work. |
op.Init(param); | ||
return op; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind the static/thread_local object? What calls this? What's the guarantee that this won;t ever be called more than once on the same thread, or is there a reason it doesn't matter? Seems like it would call Init() more than once in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static thread_local is to create an object once for each thread. Keeping this statement in a function is to help reference the object from different functions (i.e., forward and backward functions).
Init() has to be called every time the forward/backward function is invoked. Because the current operator interface is stateless. This is the best way to initialize the state for the followup operations in the forward/backward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state is just the params, right? It's almost always better to make an object/call stateless if possible, so since you have the params in-hand when thi function is calld, just pass it to the function as a parameter and work directly off of it without doing a copy or all of the fancy thread_local stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can make activation completely stateless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I feel that adding
static
in front of almost every function is not consistent with the existing coding style. - Better also run some small benchmarks to ensure that operators using cudnn by default on GPUs still keep the same behavior.
src/operator/nn/batch_norm-inl.h
Outdated
@@ -107,7 +108,7 @@ class BatchNormOp : public Operator { | |||
* need, epecial case like Batch Norm requires. | |||
* \sa OpReqType, OpContext | |||
*/ | |||
virtual void Forward(const OpContext &ctx, | |||
void Forward(const OpContext &ctx, | |||
const std::vector<TBlob> &in_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation.
src/operator/nn/batch_norm-inl.h
Outdated
@@ -157,7 +158,7 @@ class BatchNormOp : public Operator { | |||
* \param aux_states Auxiliary states of operator. Normally operator doesn't need | |||
* \sa OperatorProperty, OpReqType, OpContext | |||
*/ | |||
virtual void Backward(const OpContext &ctx, | |||
void Backward(const OpContext &ctx, | |||
const std::vector<TBlob> &out_grad, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
src/operator/nn/batch_norm-inl.h
Outdated
std::vector<int> *in_type) const override; | ||
template<typename xpu> | ||
void BatchNormCompute(const nnvm::NodeAttrs& attrs, | ||
const OpContext& ctx, const std::vector<TBlob>& inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
src/operator/nn/batch_norm-inl.h
Outdated
} | ||
template<typename xpu> | ||
void BatchNormGradCompute(const nnvm::NodeAttrs& attrs, | ||
const OpContext& ctx, const std::vector<TBlob>& inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
src/operator/nn/batch_norm.cc
Outdated
} | ||
return op; | ||
|
||
in_shape->at(1) = TShape(Shape1(channelCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it use SHAPE_ASSIGN_CHECK before? I feel assigning shapes directly may cover some shape inconsistency issues.
src/operator/nn/batch_norm.cc
Outdated
Operator *BatchNormProp::CreateOperatorEx(Context ctx, std::vector<TShape> *in_shape, | ||
std::vector<int> *in_type) const { | ||
DO_BIND_DISPATCH(CreateOp, param_, (*in_type)[0], (*in_shape)[0]); | ||
static inline std::vector<std::string> ListArguments() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are ListArguments
and ListOutputs
still here? Aren't they supposed to be registered through nnvm set_attr
interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are used by other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to replace LegacyOpRunner with CoreOpRunner in batch norm unit tests in cpp gtest tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see LegacyOpRunner in cpp batchnorm tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the gtests, in test_legacy_op.h:
template<typename OperatorProp, typename DType, typename AccReal>
using LegacyOpRunner =
mxnet::test::OperatorRunner<OperatorProp, LegacyOperatorExecutor<DType, AccReal>>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where LegacyOpRunner is defined, but it's not used for batch norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean LegacyOperatorExecutor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all gtests in batchnorm_test.cc still pass?
src/operator/nn/batch_norm-inl.h
Outdated
@@ -46,7 +46,7 @@ namespace mxnet { | |||
namespace op { | |||
|
|||
namespace batchnorm { | |||
enum BatchNormOpInputs {kData, kGamma, kBeta}; // kGamma: weights, kBeta: biases | |||
enum BatchNormOpInputs {kData, kGamma, kBeta, kInMovingMean, kInMovingVar}; // kGamma: weights, kBeta: biases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is BatchNormOpAuxiliary still needed?
src/operator/nn/batch_norm-inl.h
Outdated
@@ -157,7 +157,7 @@ class BatchNormOp : public Operator { | |||
* \param aux_states Auxiliary states of operator. Normally operator doesn't need | |||
* \sa OperatorProperty, OpReqType, OpContext | |||
*/ | |||
virtual void Backward(const OpContext &ctx, | |||
void Backward(const OpContext &ctx, | |||
const std::vector<TBlob> &out_grad, | |||
const std::vector<TBlob> &in_data, | |||
const std::vector<TBlob> &out_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting
src/operator/nn/batch_norm-inl.h
Outdated
@@ -107,7 +107,7 @@ class BatchNormOp : public Operator { | |||
* need, epecial case like Batch Norm requires. | |||
* \sa OpReqType, OpContext | |||
*/ | |||
virtual void Forward(const OpContext &ctx, | |||
void Forward(const OpContext &ctx, | |||
const std::vector<TBlob> &in_data, | |||
const std::vector<OpReqType> &req, | |||
const std::vector<TBlob> &out_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting
src/operator/nn/batch_norm-inl.h
Outdated
const OpContext& ctx, const std::vector<TBlob>& inputs, | ||
const std::vector<OpReqType>& req, | ||
const std::vector<TBlob>& outputs) { | ||
const BatchNormParam& param = nnvm::get<BatchNormParam>(attrs.parsed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting
src/operator/nn/batch_norm-inl.h
Outdated
std::vector<TBlob> aux_states(inputs.begin() + 6, inputs.begin() + 8); | ||
std::vector<TBlob> out_data(inputs.begin() + 8, inputs.end()); | ||
std::vector<TBlob> in_grad(outputs.begin(), outputs.begin() + 3); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're doing here, and I'm not sure there's a better way to do this, but all the +X's looks kind of hacky. Can you at least put comments here explaining what you're doing with all the +3 and +6's and stuff and why? For example, "BatchNormOp expects a separate aux_states vector, but we receive them as one vector..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, probably it's not too hard to just get rid of the aux_states vector for BatchNormOp. I believe a reference is made to it almost immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I was trying to avoid modifying the original code as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to not construct vectors. The overhead is quite significant.
Its ok to change the interface of BNBackward
src/operator/nn/batch_norm.cc
Outdated
in_shape->at(1) = TShape(Shape1(channelCount)); | ||
in_shape->at(2) = TShape(Shape1(channelCount)); | ||
in_shape->at(3) = TShape(Shape1(channelCount)); // kMovingMean | ||
in_shape->at(4) = TShape(Shape1(channelCount)); // kMovingVar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the enums here?
Would it be more efficient to make one TShape and then assign it rather than constructing and destructing so many?
src/operator/nn/batch_norm.cc
Outdated
} | ||
} | ||
// TODO is this a right way? | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
src/operator/nn/activation-inl.h
Outdated
Stream<xpu> *s = ctx.get_stream<xpu>(); | ||
Tensor<xpu, 2, DType> data = in_data.FlatTo2D<xpu, DType>(s); | ||
Tensor<xpu, 2, DType> out = out_data.FlatTo2D<xpu, DType>(s); | ||
Assign(out, req, F<ForwardOp>(data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated, Activation uses Kernel::Launch now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rebase the code to the new master branch
src/operator/nn/activation-inl.h
Outdated
Tensor<xpu, 2, DType> m_out_grad = out_grad.FlatTo2D<xpu, DType>(s); | ||
Tensor<xpu, 2, DType> m_out_data = out_data.FlatTo2D<xpu, DType>(s); | ||
Tensor<xpu, 2, DType> m_in_grad = in_grad.FlatTo2D<xpu, DType>(s); | ||
Assign(m_in_grad, req, F<BackwardOp>(m_out_data) * m_out_grad); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated, Activation uses Kernel::Launch now
src/operator/nn/activation-inl.h
Outdated
CHECK_EQ(outputs.size(), 1U); | ||
const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); | ||
MSHADOW_REAL_TYPE_SWITCH(inputs[0].type_flag_, DType, { | ||
switch (param.act_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called a lot and is very performance-sensitive. Can the switch statement value be replaced by a template parameter so that there's not extra comparisons/jump tables/execution-path changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the switch statement will always exist because we need to find the right code based on inputs[0].type_flag_ and param.act_type. One option is to create a table to reference to the right code based on the type and activation type. However, the switch statement is very likely to do the same with gcc optimizations turned on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The activation type isn't going to change, is it? The old implementation took the mshadow_op:: as a template argument. I suppose it's fine, though.
src/operator/nn/activation-inl.h
Outdated
switch (param.act_type) { | ||
case activation::kReLU: | ||
get_activation_op<xpu, mshadow_op::relu, mshadow_op::relu_grad, DType>().Backward( | ||
ctx, inputs[0], inputs[1], req[0], outputs[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make this Forward()/Backward() function a static member function and call them directly rather than calling some thread_local static object? It seems to adding a several levels of complication that don't need to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I was trying to reduce the number of lines of code moved as much as possible. Although the code looks a little weird, it doesn't have performance cost. We still need to maintain a bunch of states and initialize them whenever forward/backward is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What states? Just the params?
@@ -967,7 +967,7 @@ def check_batchnorm_training(stype): | |||
stypes = ['row_sparse', 'default'] | |||
for stype in stypes: | |||
check_batchnorm_training(stype) | |||
|
|||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
const std::vector<TBlob> &aux_args) { | ||
virtual void Backward(const OpContext &ctx, const std::vector<TBlob> &out_grad, | ||
const std::vector<TBlob> &in_data, const std::vector<OpReqType> &req, | ||
const std::vector<TBlob> &in_grad) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Formatting
src/operator/nn/batch_norm-inl.h
Outdated
static thread_local BatchNormOp<xpu, DType, AccReal> op; | ||
op.Init(param); | ||
op.Backward(ctx, out_grad, in_data, out_data, req, in_grad, aux_states); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like overkill and extra time. Why not just pass a const reference to params to a static Backward() function?
|
||
template<typename DType> | ||
static CuDNNBatchNormOp<DType> &GetCuDNNOp(const BatchNormParam& param) { | ||
static thread_local CuDNNBatchNormOp<DType> op; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still going to use this thread_local method with static op rather than just call a static function? You aren't going to be able to save state in the class anyway because there's no guarantee that the same thread will pass back here for the forward and backward pass, or even all backwards passes for that matter. I feel this is misleading because people are going to try to save state, plus whatever overhead is involved with the thread_local (it's not zero for all platforms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CuDNN operators, this is necessary because cudnn operators require a large number of states. Initializing these states has two phases: creating memory and setting values. Although we can't avoid setting values for these states in forward/backward, this design (with thread_local) can at least avoid creating memory over and over again. @piiswrong thinks this matters a lot. I'll add comments why we want to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this makes the operator sensitive to which thread calls Forward and Backward? as well as which thread calls it next time? If so, is this a new constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, so if CUDNN batch norm saves state in the class (for example), and if there is a network like:
(1)Convolution->(2)BatchNorm->(3)Activation->(4)Convolution->(5)BatchNorm->(6)Activation
...And one thread calls all of these in sequence, will this code mix up the state information between the first forward call of batch norm and the second? Will the backward pass through (2) be using state information from the (5) since that was the last state saved (presumably from (5)'s forward pass, but maybe also from backward pass)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values of the state are only valid within a forward/backward function and are never carried to the next function call. As such, the values of the state are always reinitialized at the beginning. What this design wants to avoid is to create memory for the state again and again.
Hello @zheng-da, please rebase your PR. |
RETURN_STRING := $(shell ./prepare_mkl.sh $(MKLML_ROOT)) | ||
MKLROOT := $(firstword $(RETURN_STRING)) | ||
export USE_MKLML = $(lastword $(RETURN_STRING)) | ||
ifeq ($(USE_MKLDNN), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cmake also adjusted for this type of build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed cmake to compile with MKLDNN.
include/mxnet/ndarray.h
Outdated
@@ -35,12 +35,12 @@ | |||
#include <map> | |||
#include <string> | |||
#include <memory> | |||
#if MXNET_USE_MKLDNN == 1 | |||
#include <mkldnn.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be installed independently or is it a submodule in '3rdparty', for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Both seem to work fine. Intel provides a python script to install MKLDNN in the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be compiled from source right? we should make it a submodule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a PR on mkl-dnn a few months back to allow it to be included as a submodule, so it shoulkd be able to work ok from cmake: oneapi-src/oneDNN#44
include/mxnet/ndarray.h
Outdated
/* | ||
* This function returns mkldnn::memory with the default primitive_desc. | ||
*/ | ||
std::shared_ptr<const mkldnn::memory> GetMKLDNNData() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do these GetMKLXXX() get called, returning shared pointers? Will this affect performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're investigating performance issues. We'll see if shared pointers can cause performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't need shared ptr, since this is called from inside operator and the NDArray itself will not be released during this time.
#if MXNET_USE_MKLDNN == 1 | ||
// Have MKL memory reference to the data in the default storage | ||
// or create memory for MKLDNN. | ||
void SetMKLMem(const TShape &shape, int dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is special about MKLDNN memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKLDNN memory has various memory layouts and it's not compatible with MXNet default layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
fi | ||
|
||
if [ -z $MKLDNNROOT ]; then | ||
if [ ! -f "$MKLDNN_INSTALLDIR/lib/libmkldnn.so" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just makes this a submodule in 3rdparty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can do so.
python/mxnet/ndarray/mkldnn.py
Outdated
return '\n<%s %s @%s>' % (self.__class__.__name__, | ||
shape_info, self.context) | ||
|
||
# TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO what?
src/operator/nn/activation-inl.h
Outdated
MSHADOW_REAL_TYPE_SWITCH(out_grad.type_flag_, DType, { | ||
switch (param.act_type) { | ||
case activation::kReLU: | ||
ActivationBackward<xpu, mshadow_op::relu, mshadow_op::relu_grad, DType>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't I see this change in the other PR about upgrading legacy operators?
src/operator/nn/batch_norm-inl.h
Outdated
* \sa OpReqType, OpContext | ||
*/ | ||
template <typename xpu, typename DType, typename AccReal> | ||
void BNForward(const OpContext &ctx, const BatchNormParam& param, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the batch norm unit tests in tests/cpp/operator/batchnorm_test.cc still working?
Why is operator refactor and MKLDNN in the same PR? |
Please make sure all googletest tests in tests/cpp still build and pass |
python/mxnet/ndarray/mkldnn.py
Outdated
"""The base class of an NDArray stored in a MKLDNN storage format. | ||
""" | ||
|
||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be removed and use the method inherited from its parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right.
python/mxnet/ndarray/mkldnn.py
Outdated
|
||
Returns | ||
------- | ||
NDArray or CSRNDArray or RowSparseNDArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation need to be updated. Does copying from mkl-dnn array to sparse ndarray work??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conversion between mkldnn and sparse array doesn't work right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception in python if the user tries to do so?
src/executor/graph_executor.cc
Outdated
@@ -54,6 +54,14 @@ GraphExecutor::~GraphExecutor() { | |||
} | |||
} | |||
|
|||
inline bool SharableStorage(NDArrayStorageType stype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const NDArrayStorageType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's pass by value. I guess it doesn't matter that much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used anymore?
@@ -416,11 +416,6 @@ nnvm::Graph InferStorageType(nnvm::Graph&& graph, | |||
DispatchModeVector dispatch_modes(graph.indexed_graph().num_nodes(), DispatchMode::kUndefined); | |||
graph.attrs["dispatch_mode"] = std::make_shared<any>(std::move(dispatch_modes)); | |||
} | |||
// initialize unknown values for dispatch modes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i remember now. because i found there is duplicated code here. The code I delete is exactly the same as the code above.
src/kvstore/kvstore_local.h
Outdated
@@ -256,7 +256,13 @@ class KVStoreLocal : public KVStore { | |||
auto validator = [this](const int key, const NDArray& nd) -> bool { | |||
auto stype = nd.storage_type(); | |||
// valid NDArray | |||
if (stype == kDefaultStorage || stype == kRowSparseStorage) return true; | |||
if (stype == kDefaultStorage || stype == kRowSparseStorage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Injecting macro into if statement is kind of hard to read. Is it better if we change to:
bool valid_stype = stype == kDefaultStorage || stype == kRowSparseStorage;
#if MXNET_USE_MKLDNN == 1
valid_stype = valid_stype || stype == kMKLDNNStorage
#endif
if (valid_stype) return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I have updated it.
src/ndarray/ndarray.cc
Outdated
@@ -31,10 +31,12 @@ | |||
#include <mxnet/resource.h> | |||
#include <mxnet/imperative.h> | |||
#include <mshadow/tensor.h> | |||
#include <mkldnn.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a macro before it's included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
python/mxnet/ndarray/mkldnn.py
Outdated
|
||
Returns | ||
------- | ||
NDArray or CSRNDArray or RowSparseNDArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception in python if the user tries to do so?
@@ -32,11 +32,6 @@ | |||
#include "mxnet/engine.h" | |||
#include "ps/ps.h" | |||
#include "./kvstore_dist_server.h" | |||
#if MKL_EXPERIMENTAL == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run distributed training test and verify it's fucntional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i haven't tried it. Where is the distributed training test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try running tests/nightly/dist_sync_kvstore.py
cd tests/nightly && ../../tools/launch.py --launcher local -n 3 python dist_sync_kvstore.py
to mock test distributed training locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your instructions. I run the test on a GPU instance and it seems to pass the test. It seems the test is specifically written for sparse formats and gradient compression.
Currently, MKLDNN always uses the default layout for weight arrays during the training. I guess it may make more sense to train a model in a distributed environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried ../../tools/launch.py --launcher local -n 3 python train_cifar10.py --kv-store dist_sync
. I let it run for a long time. It seems to converge well.
The path to the conv op mentioned here have to be updated, too |
src/operator/nn/convolution.cc
Outdated
} | ||
|
||
inline static bool ConvStorageType(const nnvm::NodeAttrs& attrs, | ||
const int dev_mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation
src/operator/nn/convolution.cc
Outdated
#if MXNET_USE_MKLDNN == 1 | ||
if (dev_mask == mshadow::cpu::kDevMask) { | ||
*dispatch_mode = DispatchMode::kFComputeEx; | ||
(*out_attrs)[0] = kMKLDNNStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still work when the stype of output is already set to non-MKL? e.g.
out = mx.nd.sparse.zeros(stype='csr', shape=(2,2))
mx.nd.Convolution(input, weight, out=out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the code will fail with an assertion. I'll add more condition to make the operators fall back when the input storage isn't supported.
@cjolivier01 the reason code refactoring and MKLDNN are put in the same PR is that we don't want to refactor original MKL code. We don't want a PR to remove features. Google test for batch norm is going to be difficult because now batchnorm and batchnorm_v1 are using different interfaces. To enable this test, I guess that the only option is to refactor batchnorm_v1 as well. |
include/mxnet/ndarray.h
Outdated
|
||
/* | ||
* This function is used inside operators to reshape an array. | ||
* It's used by FullyConnected right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullyConnected needs to reshape an array to 2D. The current implementation of Reshape for MKLDNN hangs if I call it inside an operator, because I call WaitToRead() in the end. Could you help me check if the implementation of Reshape is correct/necessary?
https://github.com/zheng-da/incubator-mxnet/blob/refactor/src/ndarray/ndarray.cc#L282
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You can't use Engine->Push from inside an operator
python/mxnet/ndarray/mkldnn.py
Outdated
from .ndarray import NDArray | ||
from .sparse import _ndarray_cls | ||
|
||
class MKLNDArray(NDArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to expose this to front end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not. I'll remove the class.
@piiswrong @eric-haibin-lin @marcoabreu @cjolivier01 @szha @rahul003 @reminisce |
I'm fine with merging it. What about others? |
LGTM |
src/operator/nn/convolution-inl.h
Outdated
@@ -118,6 +118,29 @@ struct ConvolutionParam : public dmlc::Parameter<ConvolutionParam> { | |||
this->cudnn_off == other.cudnn_off && | |||
this->layout == other.layout; | |||
} | |||
#if MXNET_USE_MKLDNN == 1 | |||
static uint64_t ComputeHash(const TShape &shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this needed for? Why not use dmlc::HashCombine
?
gpu_out = gpu_model(gpu_data) | ||
cpu_loss = softmax_cross_entropy(cpu_out, label) | ||
gpu_loss = softmax_cross_entropy(gpu_out, gpu_label) | ||
assert_almost_equal(cpu_out.asnumpy(), gpu_out.asnumpy(), rtol=1e-2, atol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do
with autograd.record(train_mode=False)
and backward(train_mode=False)
to make dropouts identity and then test for all models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is a bug in CuDNNBatchNorm. Let's skip this for now.
terminate called after throwing an instance of 'dmlc::Error'
what(): [00:16:06] src/engine/./threaded_engine.h:359: [00:16:06] src/operator/nn/./cudnn/cudnn_batch_norm-inl.h:173: Check failed: ctx.is_train && !param_.use_global_stats use global statistics is not yet supported in CuDNNBatchNorm```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create an issue for this?
@@ -0,0 +1,118 @@ | |||
#!/bin/bash | |||
|
|||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can just apply Apache license considering that copyright is on University of California and Intel.
@@ -24,11 +24,14 @@ | |||
* \author Chris Olivier | |||
*/ | |||
|
|||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be. Good catch.
|
||
def test_training(): | ||
# We use network models without dropout for testing. | ||
# TODO(zhengda) mobilenet can't pass this test even without MKLDNN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into the problem later. It's not related to this PR. The native MXNet operators can't pass the test with mobilenet. Most likely it's caused by depth-wise convolution.
gpu_out = gpu_model(gpu_data) | ||
cpu_loss = softmax_cross_entropy(cpu_out, label) | ||
gpu_loss = softmax_cross_entropy(gpu_out, gpu_label) | ||
assert_almost_equal(cpu_out.asnumpy(), gpu_out.asnumpy(), rtol=1e-2, atol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create an issue for this?
@piiswrong why was this PR merged although there were still reviewers who requested changes? |
Are there any objections to reverting this commit? If so, please explain.
…On Wed, Jan 31, 2018 at 3:32 PM, Marco de Abreu ***@***.***> wrote:
@piiswrong <https://github.com/piiswrong> why was this PR merged although
there were still reviewers who requested changes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8302 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKts_ZFI5Hvc42l8SyxVwpZL931WURrjks5tQPgYgaJpZM4P7Qml>
.
|
+1 for reverting
On Wed, Jan 31, 2018 at 6:52 PM, Chris Olivier <[email protected]>
wrote:
… Are there any objections to reverting this commit? If so, please explain.
On Wed, Jan 31, 2018 at 3:32 PM, Marco de Abreu ***@***.***>
wrote:
> @piiswrong <https://github.com/piiswrong> why was this PR merged
although
> there were still reviewers who requested changes?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#8302
issuecomment-362106780>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AKts_
ZFI5Hvc42l8SyxVwpZL931WURrjks5tQPgYgaJpZM4P7Qml>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8302 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARxB65iVKn2pDlyvRRk2T4bVQYh46jyPks5tQSbvgaJpZM4P7Qml>
.
|
This reverts commit 2cc2aa2.
This reverts commit 2cc2aa2.
This reverts commit 2cc2aa2.
This reverts commit 2cc2aa2.
…)" This reverts commit afdb839.
* Remove MKL code. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586f. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae5. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebc. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f. Rebase and Refactor deconv (apache#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (apache#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (apache#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly.
This reverts commit 2cc2aa2.
* Remove MKL code. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586f. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae5. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebc. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly.
This reverts commit 2cc2aa2.
Description
Refactor operators to use the NNVM interface.
Clean up the original MKL code.
Integrate MKLDNN.
Checklist
Essentials
make lint
)Changes
Comments