-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New tutorial of implementing operators in MXNet backend #7828
Conversation
put this in tutorial under a new section called extend. Also let's move the python custom op guide to tutorial under extend |
>>> c = mx.sym.Variable('c', shape=(0, 3)) | ||
>>> d = a * b + b * c | ||
>>> print d.infer_shape() | ||
([(2L, 3L), (2L, 3L), (2L, 3L)], [(2L, 3L)], []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A beginner may not understand the returned tuple of lists means. Add a note explaining that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
and then go through it line by line. | ||
```cpp | ||
template<typename xpu> // 1 | ||
void QuadraticOpForward(const nnvm::NodeAttrs& attrs, // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we first mention the operator interface the developer is going to use, since it's a fixed one and disallows customization?
void (const nnvm::NodeAttrs& attrs, // 2
const OpContext& ctx, // 3
const std::vector<TBlob>& inputs, // 4
const std::vector<OpReqType>& req, // 5
const std::vector<TBlob>& outputs) { // 6
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# check backward using finite difference | ||
data = mx.sym.Variable('data') | ||
quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c) | ||
check_numeric_gradient(quad_sym, [data_np]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also mention check_symbolic_forward
and check_symbolic_backward
since they're usually used to test, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
7e36f8e
to
f3b5544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone new to creating operators, I had a few questions while I was reading the tutorial. They may or may not need to be clarified in the tutorial
docs/how_to/add_op_in_backend.md
Outdated
dimension of two shapes, such as (2, 3) and (3, 3), the macro would throw an | ||
exception with an error message for shape inference. | ||
5. At the end of the function body, we checked whether the output shape | ||
is completely known by testing whether its size is greater than 0. If not, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this variable having size >0 mean that its shape is completely known? Can't the shape be multi-dimensional, where we know some dimensions? For example, going by the previous python example, can't the shape be something like (2,0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size=0
means that at least one dimension is 0
, which means the shape is undefined and must be inferred before running forward/backward functions.. I will make the point clear here.
docs/how_to/add_op_in_backend.md
Outdated
the interface `get_with_shape`. | ||
- Line 13: Get user input parameters from the node attribute. Here the node | ||
means a placeholder for the operator in the whole computational graph for | ||
the neural network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to move the description of node to the line 1, with the description of attrs, because that is when we first see NodeAttrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will do that.
in the computational graph. MXNet would | ||
add the missing argument with name `quadratic0_data`, where the prefix | ||
`quadratic0` is the operator name appended with an index and the postfix | ||
`data` comes from the return value of the user defined `FListInputName` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of mx.sym.quadratic(), I understand that the computation graph can identify that it takes a variable, and creates a node in the graph with quadratic0_data. But such a function can't actually run because there's no argument, right? Can that variable be assigned somewhere later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument is assigned values during symbol binding stage. That could be another long tutorial. I will add a few sentences to make it clear here.
|
||
# check backward using finite difference | ||
data = mx.sym.Variable('data') | ||
quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also talk about where the operator is defined in the python API? Are all operators defined under mx.nd
.
Is there a typo here where it says mx.sym.quadratic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once an op is implemented in backend, it's registered in both mx.nd
and mx.sym
in frontend when you type import mxnet
. I will mention it in the tutorial.
Could you please also explain the use of auxiliary states in the tutorial? (initialization, update, and state sharing if possible) |
CHECK_EQ(out_attrs->size(), 1U); | ||
|
||
SHAPE_ASSIGN_CHECK(*out_attrs, 0, in_attrs->at(0)); | ||
SHAPE_ASSIGN_CHECK(*in_attrs, 0, out_attrs->at(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran the code, out_attrs doesn't contain the shape information of the output array before this line. Could you explain in what case we need to use the shape information of the output array to infer the shape of the input array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider the following example. q
is the output of the operator quadratic
and is inferred from data2
. Then data1
is inferred from q
in the backward inference pass. I will add this to the tutorial as well.
import mxnet as mx
data1 = mx.sym.var('data1')
q = mx.sym.quadratic(data=data1)
data2 = mx.sym.var('data2', shape=(2, 3))
s = q + data2
print s.infer_shape()
@x10000year The operators with auxiliary states are currently implemented using the legacy operator framework (inheriting from the class |
@reminisce Is the legacy operator framework to be deprecated in the future? This make me a bit worried because in my projects many custom operators have complex states which are arbitrary c++ data structures as class members. Different operators can have different types of states. Those states are computed in forward pass and may be accessed in backward pass. The new nnvm operator framework seems to only support stateless operators. |
@x10000year Yes, we plan to deprecate legacy op interface (@piiswrong correct me if I am mistaken). For operators with states, there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for creating this tutorial!
2. Define type and shape inference functions in `quadratic_op-inl.h`. | ||
3. Define forward and backward functions in `quadratic_op-inl.h`. | ||
4. Register the operator using [nnvm](https://github.com/dmlc/nnvm) | ||
in `quadratic_op.cc` and `quadratic_op.cu` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a guideline on where to place these operators in src/operator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't one. I will add some in this tutorial.
a backward pass. Note that we used a convenience functor struct `ElemwiseGradUseIn`. | ||
As you can tell from the name, the registered functor creates the node for gradient computation | ||
with dependencies on the output gradient node and input node. Similarly, there are | ||
other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for nodes created using any of these functors is output gradient node always a dependency in addition to something else ?
with dependencies on the output gradient node and input node. Similarly, there are | ||
other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`, | ||
and `ElemwiseGradUseNone` for developers' convenience. In order to add | ||
this attribute, we also need to register a backward operator for `quadratic` with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mentioning the name of backward operator _backward_quadratic will help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, I forgot. Will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing such as detailed tutorial. It is very well written. Few comments.
docs/how_to/add_op_in_backend.md
Outdated
To implement this, we first create three files: `quadratic_op-inl.h`, | ||
`quadratic_op.cc`, and `quadratic_op.cu`. Then we are going to | ||
1. Define the parameter struct | ||
for registering `a`, `b`, and `c` in `quadratic_op-inl.h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief note on file naming conventions would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
|
||
One important thing to note that inference functions should be capable of | ||
performing **mutual inference**, i.e. | ||
inferring input shape from output shape, inferring one argument's shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be qualify it as input argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
One important thing to note that inference functions should be capable of | ||
performing **mutual inference**, i.e. | ||
inferring input shape from output shape, inferring one argument's shape | ||
from another argument, etc. This is very useful in building the computational graphs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be for a different article, but it would very helpful to explain how computation graphs are constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
}); // 21 | ||
} // 22 | ||
``` | ||
- Line 1: `attrs` contains the user input parameters `a`, `b`, and `c`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Line 1 was template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I corrected it and the following lines.
template<int req> | ||
struct quadratic_forward { | ||
template<typename DType> | ||
MSHADOW_XINLINE static void Map(int i, DType* out_data, const DType* in_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like you have done for the above code snippet, it may be helpful to explain what MSHADOW_XINLINE
and KERNEL_ASSIGN
macros do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
dL/dx = dL/dy * dy/dx = dL/dy * (2*a*x + b). | ||
``` | ||
The above equation indicates that `dL/dx` depends on the gradient | ||
of the output tensor and the input tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient of the output tensor and value of the input tensor
of the input tensor will not be overwritten by the output. | ||
- Line 20: Define the input argument name as `data` for the operator. | ||
- Line 21: Add user input parameters `a`, `b`, and `c` as the attributes of the operator. | ||
- Line 22: Register an operator named `_backward_quadratic` for backward pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note on naming convention would be helpful. Is naming the backward operator _backward_foo a suggestion or a rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/how_to/add_op_in_backend.md
Outdated
## Summary | ||
In this tutorial, we practiced implementing the operator `quadratic` in MXNet backend | ||
and unit testing the implementation in frontend. More specifically, we added parameter | ||
struct for user-input parameters, walked through shape and type inference work flow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workflow
7ab7d45
to
9d5535f
Compare
Thank everyone for reviewing the tutorial. I have addressed all the comments. |
* Tutorial first commit * Add shape/dtype inference tutorial * Add forward function * Delete trailing spaces * Add fwd/bwd registration * Finish * Fix * Fix * Fix * Fix * Fix * Fix * Fix based on comments * Change index * Fix shape inference example * Address comments * More fix
So far, we have acquired an operator working on CPU in frontend. | ||
In order to register the operator working on GPUs, we just need to add the following | ||
code to `quadratic_op.cu`. Note that forward and backward functions | ||
are registered with attribute key `FCompute<gpu>`, rather than `FCompute<cpu>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between FCompute and FComputeEx? Where can I find any document about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FComputeEx takes NDArray instead of TBlob items and is generally called when the tensor is of sparse storage type (kCSRStorage or kRowSparseStorage instead of kDefaultStorage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your kindly reply, @cjolivier01. One more question, I notice that there are ForwardResource and BackwardResource methods to register addtional memory resource for computation in previous document for creating OP here.
So with NNVM interfaces, how can I register these resouces for forward and backward computation? And how can I share some states/memory between forward and backward computation. It seems the two functions are defined seperately in this tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For registering temp resources using NNVM interface, you can take a look at this example. You need to register resources separately for forward and backward if they both need temp resources.
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op.cc#L519 - For sharing states between forward and backward, you can take a look at BatchNorm op which has a
aux_states
argument for both forward and backward function. Please note that this kind of interface is going to be deprecated. The new way for sharing states would use NNVM interface to register shared states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, @reminisce. I will look into the aux_states in BN and the new NNVM interfaces. So what's your suggestion about stateless ops .vs. stateful ops for MXNet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest focusing on nnvm interface for both stateful and stateless ops from now on as the legacy interface is going to become deprecated. You can follow this PR for more details on using nnvm interface for stateful ops.
#8302
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for your help. :)
A new tutorial for implementing operators in MXNet backend, tailored for users interested in knowing about and contributing to MXNet C++ code base.
See this link for a friendly view.
@piiswrong @eric-haibin-lin @anirudh2290 @cjolivier01 @rahul003 @madjam @bhavinthaker