Base structure for Native Operators #16

vchuravy · 2015-11-05T04:51:09Z

Thanks to the help in #15 I managed to get past the initialization stage of _Native.

With the test_enty functions defined in src/nativeops.jl

info = mx.NativeOpInfo(fb_entry, fb_entry, infer_entry, list_entry, list_entry)
pstring = bytestring("0x", hex(reinterpret(UInt, pointer_from_objref(info))))
mx._Native(name = :test, info = pstring)

Works without causing a seqfault 🎉

Todo:

Defining API exposed to Users
Defining (gc-safe) entry points (maybe also thread safe?)

API

I was thinking of something along the lines of

immutable Softmax <: mx.NativeOp end

function forward(::Type{Softmax}, in_data, out_data)
  ...
end

function forward(::Type{Softmax}, out_grad, in_data, out_data, in_grad)
  ...
end

function infer_shape(::Type{Softmax},  in_shape)
  ...
end

function list_output(::Type{Softmax})
  ...
end

function list_output(::Type{Softmax})
  ...
end

pluskid · 2015-11-05T04:58:34Z

Great! Thanks! You might want to use an actual object instead of mere type in the API. e.g.

function list_output(op::Softmax)
  ...
end

As my impression is that an operator could actually have some properties, like name, etc.

codecov-io · 2015-11-05T05:03:34Z

Current coverage is `67.44%`

Merging #16 into master will decrease coverage by -6.30% as of 42dd45c

@@            master     #16   diff @@
======================================
  Files           20      21     +1
  Stmts         1531    1674   +143
  Branches         0       0       
  Methods          0       0       
======================================
  Hit           1129    1129       
  Partial          0       0       
- Missed         402     545   +143

Review entire Coverage Diff as of 42dd45c

Powered by Codecov. Updated on successful CI builds.

vchuravy · 2015-11-05T05:56:28Z

I want to formulate it threadsafe from the get go and I don't know how to do that (yet) -> https://groups.google.com/forum/#!topic/julia-dev/9J1GYfCyVpE

pluskid · 2015-11-08T15:52:40Z

I dig a bit into the discussions on mailing list. Just to summarize based on my understanding:

For the kvstore callback (rewrite kvstore callback to be threadsafe #18), since @tqchen mentioned, that callback is guaranteed to be called in the main thread, there is no issue of the current implementation. We will probably go with the current way since it is somehow more efficient for some unknown reason.
For the operator case, since the callback is executed from a different thread, and Julia GC is not thread safe (yet), the only safe thing to do in the callback is ccall(:uv_async_send, ...). So we will have to make this a shallow wrapper that does nothing but notify an actual async worker in the main thread to do the job. This shallow callback can not do waiting as context switching is definitely going to cause issue. So we will need to modify libmxnet to add ability to get notified when the Julia async worker finished the actual computation of the operator.

vchuravy · 2015-11-08T16:16:25Z

@pluskid Exactly. also relevant #18 (comment)

I am currently trying to implement async safe callbacks in OpenCL.jl with the ability to pass data along, but I am running into issues. I will keep you updated as it goes. JuliaGPU/OpenCL.jl#87

tqchen · 2015-11-21T00:59:50Z

Sorry to keep you wait for a while, this PR apache/mxnet#667 enables asynchronize OP.

I would recommend make an update to NativeOp to enable async mode, that exposes a callback to C API, which does some finalization stuffs(e.g. copy data back to TBlob), and call async_on_compelete.

pluskid · 2015-11-21T04:52:32Z

@tqchen Thanks for adding this! I had a closer look at the nativeop API. It's a bit strange to expose the user some raw pointer to operate with. I think it might be better to use NDArray directly since it has nice async execution scheduling. The user could choose to use NDArray operations to implement the operator (which is potentially faster and runs on GPU), or if he want to work with numpy array or Julia array, copying the data from and to an NDArray is super easy in the script side (we could also provide a simple default wrapper for this case). What do you think?

piiswrong · 2015-11-21T05:11:57Z

@pluskid That was what I was going to do originally, but due to some reason I went this way instead.
You are welcome to refactor it.

vchuravy · 2015-11-21T05:21:50Z

@pluskid @piiswrong I will look into the refactoring today, for both the python side and the julia side.
@tqchen Thanks for loonking into async ops.

piiswrong · 2015-11-21T05:25:20Z

@vchuravy remember to set the ctx of the ndarraies to OpContext.ctx

pluskid · 2015-11-21T06:11:47Z

@piiswrong Thanks! @vchuravy Great! I'll leave it to you now then.

tqchen · 2015-11-21T06:29:24Z

There was a bit interfacing issue that we did not yet expose ndarray as operator.

The callback Julia operations are meant to be ran by another worker thread, while itself execute in kinda of synchronized manner.
NDArray operations are usually executed by main thread, and itself runs in async manner.

So if synchronize copy is introduced in the operator, the operator itself blocks in the memcpy call (of course this could also be ran by another worker thread), which is OK. Current operator was kinda of simplified to assume it actually does the heavy lifting works, copy to CPU and calls kinda of synchronize callback in native language.

To support NDArray tracking, we will need to introduce operator interface with NDArray, while most wrapper should still go with the synchronized real action. This complicates a bit with c++11 issues, as c++11 support by nvcc was introduced, but seems not yet perfect, and current cuda related code are not yet c++11. But I am sure we will find a way.

For short term, making native julia and numpy op work would be a good first step

piiswrong · 2015-11-21T06:50:43Z

@vchuravy BTW, you probably will be interested in apache/mxnet#668
This is a runtime compile engine for CUDA. You can write kernels in python or julia

piiswrong · 2015-11-21T07:14:38Z

@vchuravy when do you expect to finish C++ side refactor? I also want that feature, maybe we can setup an API standard and divide labor

vchuravy · 2015-11-21T07:52:29Z

@piiswrong I was thinking of later tonight, but if you want go ahead and I focus on the Julia side.

vchuravy · 2015-12-13T08:24:55Z

@pluskid I made some progress, but this is primarily focused on the conceptual side of how things should work. I am now working on getting it to a state where it actually works.

pluskid · 2015-12-13T14:59:36Z

@vchuravy Thanks! With the end of the semester settled and coming back from conference, I think I will probably finally be able to resume a relatively stable commitment to the project soon. What is the current status and current blocking item? Let me know if you need any help on this side.

vchuravy · 2015-12-13T15:28:20Z

I still have to write the entire points and the rtc interface. After that
it is mostly going to be weeding out bugs and making sure everything is
thread safe.

On Sun, 13 Dec 2015, 23:59 Chiyuan Zhang [email protected] wrote:

@vchuravy https://github.com/vchuravy Thanks! With the end of the
semester settled and coming back from conference, I think I will probably
finally be able to resume a relatively stable commitment to the project
soon. What is the current status and current blocking item? Let me know if
you need any help on this side.

—
Reply to this email directly or view it on GitHub
#16 (comment).

vchuravy · 2016-01-07T07:21:37Z

@pluskid Conceptually everything is in place, but currently the following still seqfaults.

using MXNet
immutable Softmax <: mx.Native.Operator end
op = Softmax()
info = mx.Native.NDArrayOpInfo(op)
pstring = bytestring("0x", hex(reinterpret(UInt, pointer_from_objref(info))))
mx._NDArray(name = :test, info = pstring)

So I will have to do some more testing and debugging. But if you can find time and look over the basic design?

vchuravy · 2016-01-07T08:37:54Z

Initializing a NativeOperator now works without seqfaulting.One major issue where I am unsure how to best handle that is the lifetime of Julia objects that we return via pointers to mxnet. One of those examples would be list_arguments. The Julia function returns an array of strings, that then gets converted to an array of pointers and we then store the pointer of that array to mxnet. When the GC kicks in the strings will get collected and the memory addresses will be void.

We will either keep track of all the pointers we hand off to mxnet or get MXNet to allocate memory for us.
http://docs.julialang.org/en/release-0.4/manual/calling-c-and-fortran-code/?highlight=cconvert#garbage-collection-safety

vchuravy · 2016-01-07T09:09:23Z

@pluskid

I would like to define

function Base.call(op :: Operator; kwargs...)
  info = NDArrayOpInfo(op)
  pstring = bytestring("0x", hex(reinterpret(UInt, pointer_from_objref(info))))
  mx._NDArray(info = pstring, kwargs...)
end

So that the following would work

immutable SoftmaxOp <: mx.Native.Operator end; 
Softmax = SoftmaxOP(); 
Softmax(name = :Softmax)

But the function gets precompiled and it doesn't find _Nativeany ideas how circumvent that, short of using @generated function?

… immutables

We need to make sure that `uv_async_send` is only send once at a time and that we block the entry function.

vchuravy · 2016-08-17T21:51:48Z

So I think this is actually getting into the realm of the possible. The trick is to use put a lock around :uv_async_send and potentially making the entire call blocking.

Right now I am running into the issue that if I make the entire call blocking, the execution engine seems to stall and because operations on NDArray are blocking.

@piiswrong do I see it right that I should build this on top of the Custom operator and that NativeOp and NDArrayOP are deprecated? And am I also correct in assuming that https://github.com/dmlc/mxnet/blob/916dc9670abe2b64666c076bfff9f8feac23b8d6/src/operator/custom.cc#L75 is supposed to be blocking?

TODO:

Use condition and not barrier
Move to Custom
Decide between Julia arrays or NDArrays for the API
Allow for push! and GPU codegen

piiswrong · 2016-08-18T00:21:06Z

@vchuravy Yes use custom op. The others are deprecated.

It's subtle. It can be either blocking or non blocking. However, if you do anything blocking in forward, you need to set the env variables such that there are more than 2 cpu/gpu worker threads

vchuravy · 2016-08-18T00:27:51Z

Non blocking I would have to push a new sync point? Similar to how NVRTC is
implemented in python?

On Wed, 17 Aug 2016, 20:21 Eric Junyuan Xie, [email protected]
wrote:

@vchuravy https://github.com/vchuravy Yes use custom op. The others are
deprecated.

It's subtle. It can be either blocking or non blocking. However, if you do
anything blocking in forward, you need to set the env variables such that
there are more than 2 cpu/gpu worker threads

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#16 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAI3anJWHixnCoKgdRU_w9A6NpUbhSomks5qg6VygaJpZM4GcVIK
.

piiswrong · 2016-08-18T00:36:04Z

No syncing is needed from the frontend side

vchuravy force-pushed the vc/nativeops branch from 2457a29 to 00e24c6 Compare November 6, 2015 01:40

vchuravy mentioned this pull request Nov 6, 2015

rewrite kvstore callback to be threadsafe #18

Closed

pluskid mentioned this pull request Nov 8, 2015

HOWTO pages for key use cases apache/mxnet#513

Closed

vchuravy mentioned this pull request Nov 12, 2015

Fixed the trailing L problem of hex(...) apache/mxnet#555

Merged

This was referenced Nov 21, 2015

[WIP/RFC] Asynchronous native operators apache/mxnet#672

Closed

Julia-based definition of new layer #39

Open

vchuravy force-pushed the vc/nativeops branch from efd0a3f to b729dd4 Compare December 13, 2015 08:22

vchuravy force-pushed the vc/nativeops branch 2 times, most recently from e492b57 to f223c96 Compare January 7, 2016 07:02

vchuravy added 22 commits August 13, 2016 02:51

handle forward and backward entry

510de9e

handle declare backward dependecy

fea55b1

weed out surface bugs

f059ff9

properly store outputs and arguments

340a3f9

initial test example for softmax

a20241c

terminate list of strings with empty string to prevent seqfaults

91010a2

Make initialization work

d26178a

use unsafe_load instead of creating an array for forward/backward

f2612ce

reverse shapes

e2f26c3

throw method error instead of predefining functions

8ab553c

use method signatures throughout

ca93d8d

use reverse!

1f394cb

fix reversing shapes

deac772

use convert instead of typeassert on return values

09b9d5e

fixup: import ..mx

d872f02

define call foe native operators

e4906bb

introduce WeakKeyDict for storage management and use types instead of…

0aad560

… immutables

ensure that received tags are correct

61fd0c8

fixup backward

651a9e9

use RawMutex.jl for backwards and forwards

68ccb4b

We need to make sure that `uv_async_send` is only send once at a time and that we block the entry function.

Move mutex to later point so that we are not racy for the barrier

209c478

get closer to correct interaction between c++ threads and julia

d3a2496

vchuravy force-pushed the vc/nativeops branch from 060cbf0 to d3a2496 Compare August 17, 2016 21:54

vchuravy mentioned this pull request Sep 5, 2016

[Preview] Custom ops #130

Closed

vchuravy closed this Sep 5, 2016

Arkoniak mentioned this pull request Jan 11, 2017

Is creation of custom operators supported? #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base structure for Native Operators #16

Base structure for Native Operators #16

vchuravy commented Nov 5, 2015

pluskid commented Nov 5, 2015

codecov-io commented Nov 5, 2015

vchuravy commented Nov 5, 2015

pluskid commented Nov 8, 2015

vchuravy commented Nov 8, 2015

tqchen commented Nov 21, 2015

pluskid commented Nov 21, 2015

piiswrong commented Nov 21, 2015

vchuravy commented Nov 21, 2015

piiswrong commented Nov 21, 2015

pluskid commented Nov 21, 2015

tqchen commented Nov 21, 2015

piiswrong commented Nov 21, 2015

piiswrong commented Nov 21, 2015

vchuravy commented Nov 21, 2015

vchuravy commented Dec 13, 2015

pluskid commented Dec 13, 2015

vchuravy commented Dec 13, 2015

vchuravy commented Jan 7, 2016

vchuravy commented Jan 7, 2016

vchuravy commented Jan 7, 2016

vchuravy commented Aug 17, 2016 •

edited

Loading

piiswrong commented Aug 18, 2016

vchuravy commented Aug 18, 2016

piiswrong commented Aug 18, 2016

Base structure for Native Operators #16

Base structure for Native Operators #16

Conversation

vchuravy commented Nov 5, 2015

API

pluskid commented Nov 5, 2015

codecov-io commented Nov 5, 2015

Current coverage is 67.44%

vchuravy commented Nov 5, 2015

pluskid commented Nov 8, 2015

vchuravy commented Nov 8, 2015

tqchen commented Nov 21, 2015

pluskid commented Nov 21, 2015

piiswrong commented Nov 21, 2015

vchuravy commented Nov 21, 2015

piiswrong commented Nov 21, 2015

pluskid commented Nov 21, 2015

tqchen commented Nov 21, 2015

piiswrong commented Nov 21, 2015

piiswrong commented Nov 21, 2015

vchuravy commented Nov 21, 2015

vchuravy commented Dec 13, 2015

pluskid commented Dec 13, 2015

vchuravy commented Dec 13, 2015

vchuravy commented Jan 7, 2016

vchuravy commented Jan 7, 2016

vchuravy commented Jan 7, 2016

vchuravy commented Aug 17, 2016 • edited Loading

TODO:

piiswrong commented Aug 18, 2016

vchuravy commented Aug 18, 2016

piiswrong commented Aug 18, 2016

Current coverage is `67.44%`

vchuravy commented Aug 17, 2016 •

edited

Loading