-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap #22
Comments
Even though restricted Boltzmann machines (and DBMs/DBNs) and autoencoders (DAE, CAE, stacked autoencoders) have a different principle as they are unsupervised, having an implementation that follows the Mocha architecture could be useful. We started discussing this for DBNs here, as we have a simple implementation for RBMs and DBNs and would like to make it compatible with Mocha. |
@jfsantos Thanks! I think autoencoders, although unsupervised, are still trained with SGD, we just specify the label to be the same as the input data, and then in principle we could already do this in Mocha. And we might need to add some special layers to support variants of autoencoders. But I might be wrong, as I haven't worked on autoencoders at all. Do you know the details? As for Bayesian networks, yes, I agree they are very different paradigms. And especially we already have a package (dfdx/Boltzmann.jl#3) on that, I think it is better to keep them in two different packages. But definitely making them compatible should be a goal, and maybe some collaboration. For using DBNs/DBMs to initialize the weights of DNNs, I think this might already be quite easy. If you could export the weights to HDF5 file with compatible naming, then Mocha should be able to load them, just like loading Caffe's exported models, and then start supervised training on that. We could make Mocha's loading interface richer by for example, allow the user to control in fine details which layer should load from which file a dataset with which name, etc. Also we could probably discuss about a common data format that suits both needs. |
You are right about autoencoders being trained with SGD as MLPs. There are some "special" things, though:
I'll work on a draft implementation for initializing a DNN with a DBN from Boltzmann.jl and let you know as soon as I have something (hopefully, by submitting a pull request!). |
@jfsantos Thanks for the details! I see, it is kind of do-able but not trivial. I need to think about this further. |
Just wondering what the ETA for recurrence support might be? |
@philtomson That is definitely a plan/goal, but maybe after the auto-encoders. The reason is that I do not know RNN enough to start implement them right away. But I think many of the building-blocks are already there. Especially if you want to do a simple explicit unfolding of fixed-length history, I think one could already have a model like that by making use of the shared-parameter mechanism in Mocha. For variable-length RNN support, I need to think more, especially about how the interface should be organized. That being said, suggestions are very welcome from people who already know RNN. For example, what is the simplest, representative and reproducible example for RNN (like MNIST for CNN)? Are there any nice existing library for RNN (whose way of organizing the user interface we should possible learn from)? etc. |
@pluskid Maybe the followings are helpful: |
@zhongwen Thanks for the links! |
I'm planning to add time-delay neural networks. I have a working implementation ( https://github.com/the-moliver/NeuralNets.jl ) that I want to port to Mocha. |
It would be nice to have a Caffe file -> Mocha converter. Maybe I'll work on something like that. Should be doable, right? Or are there Caffe features that are not yet in Mocha? |
We already have the ability to load caffe models, but you still need to manually translate the model definition. Automatic translation of architecture is theoretically possible but I guess might by quite tedious to implement. (I'm thinking maybe there should be some universal Dnn architecture specification language coming out recently). Most of the core functionality in caffe has correspondence in mocha. But caffe also have many unofficial forks, which implemented some specific layers, for those, it is more difficult to convert. |
|
@nikolaypavlov Thanks for the suggestions
|
@pluskid Great, I'll try to play with PoolingLayer. |
Is this project meant to be the Theano/Torch of Julia? Is there ever going to be OpenCL support? |
@outlace, this is more like torch than theano in that sense. There is no planned Opencl support unless Julia gets better native support for gpu targets. |
I would be very interested in OpenCL support as well. In fact, I have half a mind to take a stab at it myself. If I can leverage an OpenCL BLAS library (say, CLBLAS.jl), then I basically just have to write If I did this, in the interest of clarity would you be OK with renaming GPUBackend -> CUDABackend (adding |
@nstiurca Thanks! This could be cool! Yes, I'm OK with the renaming if we have a working OpenCL backend! |
OK, I will get started this weekend. Should we open an issue for the sake of tracking? Development-wise, it will be simplest for me to create an |
I would suggest do it in your branch, but open a pull request to here, with "[WIP]" in the title and description of the goal and current progress in the text (that you could updates periodically). I will not merge the pr until you have something reasonablely stable, but people will see the pr and could probably jump in to help. |
That works for me. Look for it later today. |
I think this is great. I currently have to use Torch because it's the only mature package that has an OpenCL backend. Being able to run models on my Macbook is fantastic. Really looking forward to this getting OpenCL support. |
Any plans to implement batch normalization (http://jmlr.org/proceedings/papers/v37/ioffe15.pdf )? Looks like it's a great step forward in terms of trainging time! |
@lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example is already working quite nicely. |
Is MXNet.jl complementary to Mocha.jl or meant to replace it?
|
@philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries. |
I wonder if mxnet could be an alternate backend for Mocha.jl? It seems like On Sat, Oct 24, 2015 at 5:50 PM, Chiyuan Zhang [email protected]
|
@philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option. Though a something still needs to be improved, esp. documents. |
On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang [email protected]
BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess I can see where training with multiple GPUs can be an advantage, but some Also: Does the mxnet project have any plans for supporting OpenCL?
Mocha.jl's documents are actually pretty good at this point so this is a I suppose another idea would be to translate the CPP backend for Mocha.jl
|
I just got around to installing MXNet.jl and playing with it some. So far On Mon, Oct 26, 2015 at 4:01 PM, Phil Tomson [email protected] wrote:
|
@philtomson Glad to hear that it works out nicely for you. The single-GPU performance of Mocha.jl might be similar to MXNet.jl. MXNet.jl has a more flexible symbolic API to define network architectures, but internally optimizations are used to avoid unnecessary memory allocation & computation, etc. But multi-GPU is definitely a win on MXNet.jl side. I agree that many users with small scale applications do not use GPUs. In this case, the default CPU only libmxnet.so should still be quite straightforward to compile (at least on Linux and OS X). And since libmxnet is actually relatively low level backend, many of the logics will still be built in Julia, and the interface is actually flexible and convenient enough to use. One of the main goal of the joint-force under the dmlc/libmxnet is to avoid duplicated labors especially in the computational heavy backend. One layer implemented will be automatically available in Python, Julia, R frontends. Currently I will be maintaining both Mocha.jl and MXNet.jl. In the future when MXNet.jl become more mature, I will try to advocate MXNet.jl as a successor of Mocha.jl. |
For those who is interested in RNN/LSTM in Julia. Here is an char-rnn LSTM implementation in MXNet.jl now. It used explicit unrolling so everything fit in the current |
Discussions and/or suggestions are welcome!
The text was updated successfully, but these errors were encountered: