This module is a prototype for complete the implementation of the xnor kernel on CUDA. With a tensorflow interface.
Heavily inspired by the original implementation in Theano by Matthieu Courbariaux
Major feature:
- Supports arbitrary size matrices.
- Comes with Tensorflow Binding
Generated with the ipython notebook that is also in this repo. benchmark ran with CUDA 7.5, cuDNN v4 on Titan Black, Intel core i7-5820K
Note: This code probably not the most optimized code, since it's my first CUDA program. Suggestions are welcome