Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKLDNN conv2d kernel added #8451

Merged
merged 4 commits into from
Mar 7, 2018
Merged

Conversation

pzelazko-intel
Copy link
Contributor

MKLDNN conv2d and pool2d OP kernels can be enabled with use_mkldnn OP flag - just like currenly present use_cudnn flag. It's set to True by default. use_cudnn flag has figher priority.

Beside unit tests, we validated these kernels by running training and interference on MNIST dataset and comparing results with caffe library.

@CLAassistant
Copy link

CLAassistant commented Feb 15, 2018

CLA assistant check
All committers have signed the CLA.

@luotao1
Copy link
Contributor

luotao1 commented Feb 26, 2018

Can you divide this PR into three small PRs?

  • typo fix - TransFromNeeded -> TransformNeeded and
    MKLDNNDeviceContext changes: e0531da and 7a358fa
  • MKLDNN conv2d OP kernels and unit test added
  • MKLDNN pool2d OP kernels and unit test added

@@ -236,11 +236,11 @@ class OpKernelRegistrar : public Registrar {

#define USE_CUDA_ONLY_OP(op_type) \
USE_OP_ITSELF(op_type); \
USE_OP_DEVICE_KERNEL(op_type, CUDA)
USE_OP_DEVICE_KERNEL(op_type, CUDA);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not add ; at the end, we want user use a macro like

SOME_MACRO();

file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(conv2d, CUDNN);\n")
file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(pool2d, CUDNN);\n")
file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(conv2d_transpose, CUDNN);\n")
op_library(edit_distance_op SRCS edit_distance_op.cc edit_distance_op.cu DEPS math_function)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will pybind USE_OP_DEVICE_KERNEL(XXX, CUDNN) automatically in #8590, in order to make operators/CMakelists.txt much cleaner.

Then, only one sentence op_library(pool_op DEPS pooling) will pybind CPU/CUDA/CUDNN/MKLDNN all device kernel.

Thus:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 Do you want my changes to be merged after #8590 is finished?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #8590 doesn't be finished before your small PRs, you can merge your changes at first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8590 is finished and merged now.

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, as synced with @pzelazko-intel, we will break this PR into some smaller ones.

As for current code, we also had a discussion.

The most important information is that the current implementation may not be the best efficient one, since the format is fixed as nchw and the transform functions is still under developing.

If anything is missing, @pzelazko-intel please point out.

} else {
library_ = framework::LibraryType::kPlain;
} else if (CanMKLDNNBeUsed(ctx)) {
library_ = framework::LibraryType::kMKLDNN;
}

std::string data_format = ctx.Attr<std::string>("data_format");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only make MKLDNN library enabled.

As synced with @pzelazko-intel, we would enable MKLDNN layout next time.
Then we would considerate transform function as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add "TODO" in codes for reminding? Including:

  • enable MKLDNN layout
  • enable groups
  • something more.

Besides, could you not cover the previous commit next time, since we could not find the difference after your updating?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm going to add TODOs.
I've covered previous commits, because I wanted commit history to be clear.
Would I have opportunity to squash commits when merging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can squash commits when merging your PR.

std::vector<int> dilations = ctx.Attr<std::vector<int>>("dilations");
int groups = ctx.Attr<int>("groups");

PADDLE_ENFORCE(groups == 1, "MKLDNN doesn't support group convolution yet");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will enable groups later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

memory::dims conv_padding = {paddings[0], paddings[1]};

auto conv_src_md = memory::desc({conv_src_tz}, memory::data_type::f32,
memory::format::nchw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format is fixed as nchw, should support more. We will come back later.

template <typename T>
class ConvOpMkldnnKernel : public paddle::framework::OpKernel<T> {
public:
void Compute(const paddle::framework::ExecutionContext& ctx) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Compute function is too long. We can think about breaking the code to smaller functions in mkldnn_herlper like cudnn_helper did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the answer below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// push op to stream and wait MKLDNN until it's executed
std::vector<primitive> pipeline{conv_prim};
stream(stream::kind::eager).submit(pipeline).wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only conv_pd is saved to context, I think we can save more, like engine, primitives, stream, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to refactor it as soon as we know how to handle data transferring between forward and backward in parallel mode (ParallelDo OP).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactoring is done


// Get an unique name from "argument" name of "Output" variable
// This name will be used as key when saving info into device context
const std::string key = ctx.op().Output("Output");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure if this output name is unique, especially under the scope filed.
I had a discussion with @QiJune before and did not get a formal conclusion at that time.
Maybe Baidu friends can give a better answer. This is just my concern and reminder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that conv_cudnn_op.cu.cc also use Output.

class CUDNNConvOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use CUDAPlace.");
auto* input = ctx.Input<Tensor>("Input");
auto* filter = ctx.Input<Tensor>("Filter");
auto* output = ctx.Output<Tensor>("Output");

Thus, why conv_mkldnn_op.cc should not use the same output name?

'use_cudnn': self.use_cudnn,
'data_format': 'AnyLayout' # TODO(dzhwinter) : should be fix latter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, it's not appropriate we remove this TODO.
I suggest we remain it and let the owner to fix it, since we may miss some background information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remove this line, it's been moved up.

@pzelazko-intel pzelazko-intel force-pushed the fluid-mkldnn branch 2 times, most recently from 7882699 to edcf89a Compare March 1, 2018 12:21
@pzelazko-intel pzelazko-intel changed the title MKLDNN conv2d and pool2d OP kernels added MKLDNN conv2d kernel added Mar 1, 2018
@pzelazko-intel
Copy link
Contributor Author

Now in this PR I'm introducing only conv2d OP MKLDNN kernel.
After this PR is accepted, I'll create a new one for pool2d OP.

@luotao1
Copy link
Contributor

luotao1 commented Mar 2, 2018

LGTM on following files:

  • paddle/fluid/operators/CMakeLists.txt
  • python/paddle/fluid/layers/nn.py
  • python/paddle/fluid/nets.py
  • python/paddle/fluid/tests/unittests/test_conv2d_op.py

@jacquesqiao Can you help review following files:

  • paddle/fluid/framework/operator.cc
  • paddle/fluid/framework/operator.h

@QiJune Can you help review following files:

  • paddle/fluid/platform/device_context.cc
  • paddle/fluid/platform/device_context.h

@tensor-tang Can you help review following files:

  • paddle/fluid/operators/conv_mkldnn_op.cc
  • paddle/fluid/operators/conv_op.cc

@pzelazko-intel pzelazko-intel force-pushed the fluid-mkldnn branch 2 times, most recently from 9a3ecb8 to a4ab82d Compare March 4, 2018 10:23
Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ARs for conv_mkldnn_op.cc and conv_op.cc

Just a reminder, as discussed before, the Compute functions are too large. Please do not forget it, since it's pretty important.

// TODO(pzelazko-intel) enable group convolution
PADDLE_ENFORCE(groups == 1, "MKLDNN doesn't support group convolution yet");
PADDLE_ENFORCE(
dilations.size() == 2 && dilations[0] == 1 && dilations[1] == 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dilation is also supported with MKLDNN conv, you can add one more TODO later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int groups = ctx.Attr<int>("groups");

// TODO(pzelazko-intel) enable group convolution
PADDLE_ENFORCE(groups == 1, "MKLDNN doesn't support group convolution yet");
Copy link
Contributor

@tensor-tang tensor-tang Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKLDNN doesn't support dilation in convolution yet

I think this error message is not clear enough, as we know MKL-DNN itself supports groups.
This message would misleading Paddle team and users.
It's that we did not enable it on paddle yet.

As well as the below dilations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -600,6 +600,23 @@ proto::VarType::Type OperatorWithKernel::IndicateDataType(
return static_cast<proto::VarType::Type>(data_type);
}

bool OperatorWithKernel::CanCUDNNBeUsed(const ExecutionContext& ctx) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make CanCUDNNBeUsed and CanMKLDNNBeUsed two global functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao where would you propose to place these functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzelazko-intel according to this document https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/mkl/mkldnn_fluid.md#mkldnn_helper, we can put the interface there and add a mkldnn_helper.cc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@luotao1
Copy link
Contributor

luotao1 commented Mar 6, 2018

Please answer reviewers’ every comment. If you are to follow the comment, please write “Done”; please give a reason otherwise. See code-review.

@pzelazko-intel
Copy link
Contributor Author

@luotao1 refactoring has been completed.
Also, I've added "done" comments where it's appropriate.

@QiJune
Copy link
Member

QiJune commented Mar 6, 2018

@luotao1 DeviceContext part looks good to me.

@jacquesqiao
Copy link
Member

framework part looks good to me, thanks! @pzelazko-intel

device_contexts_.emplace(places[i],
new platform::CPUDeviceContext(
boost::get<platform::CPUPlace>(places[i])));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have a little question here.
When PADDLE_WITH_MKLDNN is enabled, we will not have platform::CPUDeviceContext anyway .
Is that fine with you @jacquesqiao @QiJune ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang It seems that if PADDLE_WITH_GPU is enabled, there will be both CPUDeviceContext and CUDADeviceContext.
So, I think MKLDNNDeviceContext and CPUDeviceContext should be coexist.

Could you provide an example that has a two FC in the network, which one FC is CPU, and the other FC is MKLDNN? Just mnist demo will be fine.

We can see if "MKLDNN" is compatible with "CPU".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, MKLDNNDeviceContext is inherited from CPUDeviceContext now
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/platform/device_context.h#L113. So functionally, it can pass the CI on this version.

But from my perspective, I thought CPUDeviceContext should be always available no matter which third-party is added, not only MKLDNN here. Since we can not guarantee all the ops paddle supported are also supported by the third-party library. If this third-party context is not inherited from cpu context, it would be a problem.

So I just want to hear you voice. I am not sure is that proper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.
Since MKLDNNDeviceContext is inherited from CPUDeviceContext, there will be no problem.
I have not think out a better choice, we can move on first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Thx

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for MKLDNN part

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @pzelazko-intel work, and thanks @jacquesqiao @QiJune @tensor-tang review.

@luotao1 luotao1 merged commit 8c71ada into PaddlePaddle:develop Mar 7, 2018
@pzelazko-intel pzelazko-intel deleted the fluid-mkldnn branch March 8, 2018 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants