microsoft · QuanluZhang · Aug 28, 2019 · Jun 20, 2019 · Jun 21, 2019 · Jun 21, 2019
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -10,6 +10,11 @@ jobs:
   steps:
   - script: python3 -m pip install --upgrade pip setuptools --user
     displayName: 'Install python tools'
+  - script: |
+      python3 -m pip install torch==0.4.1 --user
+      python3 -m pip install torchvision==0.2.1 --user
+      python3 -m pip install tensorflow==1.12.0 --user
+    displayName: 'Install dependencies for integration'
   - script: |
       source install.sh
     displayName: 'Install nni toolkit via source code'
@@ -50,6 +55,11 @@ jobs:
   steps:
   - script: python3 -m pip install --upgrade pip setuptools
     displayName: 'Install python tools'
+  - script: |
+      python3 -m pip install torch==0.4.1 --user
+      python3 -m pip install torchvision==0.2.1 --user
+      python3 -m pip install tensorflow --user
+    displayName: 'Install dependencies for integration'
   - script: |
       source install.sh
     displayName: 'Install nni toolkit via source code'

diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
@@ -0,0 +1,39 @@
+# Compressor
+NNI provides easy-to-use toolkit to help user  design and use compression algorithm.
+
+## Framework
+We use the instrumentation method to insert a node or function after the corresponding position in the model.
+<br>
+When compression algorithm designer implements one prune algorithm, he only need to pay attention to the generation method of mask, without caring about applying the mask to the garph.
+## algorithm
+We now provide some naive compression algorithm and four popular compress agorithms for users, including two pruning algorithm and two quantization algorithm.
+Below is a list of model compression algorithms supported in our compressor
+|    Name           |  Paper    |
+| ----------        | ----------|
+| AGPruner          | [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878)|
+| SensitivityPruner |[Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626)|
+| QATquantizer      |[Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
+| DoReFaQuantizer   |[DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160)|
+
+## Usage
+
+Take naive level pruner as an example
+
+If you want to prune all weight to 80% sparsity, you can add code below into your code before your training code.
+
+Tensorflow code
+```
+nni.compressors.tfCompressor.LevelPruner(sparsity=0.8).compress(model_graph)
+```
+
+Pytorch code
+```
+nni.compressors.torchCompressor.LevelPruner(sparsity=0.8).compress(model)
+```
+
+Our compressor will automatically insert mask into your model, and you can train your model with masks without changing your training code. You will get a compressed model when you finish your training.
+
+You can get more information in Algorithm details
+
+
+
diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
@@ -0,0 +1,73 @@
+Pruner on NNI Compressor
+===
+
+## AGPruner
+In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
+
+>We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
+![](../../img/AGPruner.PNG)
+>The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation
+
+### Usage
+You can prune all weight from %0 to 80% sparsity in 10 epoch with the code below.
+
+First, you should import pruner and add mask to model.
+
+Tensorflow code
+```
+from nni.compressors.tfCompressor import AGPruner
+pruner = AGPruner(initial_sparsity=0, final_sparsity=0.8, start_epoch=1, end_epoch=10, frequency=1).compress(tf.get_default_graph())
+```
+Pytorch code
+```
+from nni.compressors.torchCompressor import AGPruner
+pruner = AGPruner(initial_sparsity=0, final_sparsity=0.8, start_epoch=1, end_epoch=10, frequency=1).compress(model)
+```
+
+Second, you should add code below to update epoch number when you finish one epoch in your training code.
+
+Tensorflow code 
+```
+pruner.update_epoch(epoch, sess)
+```
+Pytorch code
+```
+pruner.update_epoch(epoch)
+```
+You can view example for more information
+***
+
+## SensitivityPruner
+In [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626), author Song Han and provide an algorithm to find the sensitivity of each layer and set the pruning threshold to each layer.
+
+>We used the sensitivity results to find each layer’s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer’s weights
+
+### Usage
+You can prune weight step by step and reach one target sparsity by SensitivityPruner with the code below.
+
+Tensorflow code
+```
+from nni.compressors.tfCompressor import SensitivityPruner
+
+pruner = SensitivityPruner(sparsity = 0.8)
+pruner.compress(tf.get_default_graph())
+```
+Pytorch code
+```
+from nni.compressors.torchCompressor import SensitivityPruner
+
+pruner = SensitivityPruner(sparsity = 0.8)
+pruner.compress(model)
+```
+Like AGPruner, you should update mask information every epoch by adding code below
+
+Tensorflow code 
+```
+pruner.update_epoch(epoch, sess)
+```
+Pytorch code
+```
+pruner.update_epoch(epoch)
+```
+You can view example for more information
+***
diff --git a/docs/en_US/Compressor/Quantizer.md b/docs/en_US/Compressor/Quantizer.md
@@ -0,0 +1,46 @@
+Quantizer on NNI Compressor
+===
+## QATquantizer
+In [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf), authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.
+
+>We propose an approach that simulates quantization effects in the forward pass of training. Backpropagation still happens as usual, and all weights and biases are stored in floating point so that they can be easily nudged by small amounts. The forward propagation pass however simulates quantized inference as it will happen in the inference engine, by implementing in floating-point arithmetic the rounding behavior of the quantization scheme
+>* Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, the batch normalization parameters are “folded into” the weights before quantization.
+>* Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layer’s output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
+
+
+
+### Usage
+You can quantize your model to 8 bits with the code below before your training code.
+
+Tensorflow code
+```
+from nni.compressors.tfCompressor import QATquantizer
+QATquantizer(q_bits = 8).compress(tf.get_default_graph())
+```
+Pytorch code
+```
+from nni.compressors.torchCompressor import QATquantizer
+QATquantizer(q_bits = 8).compress(model)
+```
+
+You can view example for more information
+
+***
+## DoReFaQuantizer
+In [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160), authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.
+
+### Usage
+To implement DoReFaQuantizer, you can add code below before your training code
+
+Tensorflow code
+```
+from nni.compressors.tfCompressor import DoReFaQuantizer
+DoReFaQuantizer(q_bits = 8).compress(tf.get_default_graph())
+```
+Pytorch code
+```
+from nni.compressors.torchCompressor import DoReFaQuantizer
+DoReFaQuantizer(q_bits = 8).compress(model)
+```
+
+You can view example for more information
diff --git a/docs/img/AGPruner.PNG b/docs/img/AGPruner.PNG
diff --git a/setup.py b/setup.py
@@ -27,7 +27,7 @@ def read(fname):
 
 setup(
     name = 'nni',
-    version = '999.0.0-developing',
+    version = 'v0.8-263-g1f92408',
     author = 'Microsoft NNI Team',
     author_email = '[email protected]',
     description = 'Neural Network Intelligence project',

diff --git a/src/sdk/pynni/nni/compressors/__init__.py b/src/sdk/pynni/nni/compressors/__init__.py
diff --git a/src/sdk/pynni/nni/compressors/example/__init__.py b/src/sdk/pynni/nni/compressors/example/__init__.py
diff --git a/src/sdk/pynni/nni/compressors/example/main_tf_pruner.py b/src/sdk/pynni/nni/compressors/example/main_tf_pruner.py
@@ -0,0 +1,115 @@
+from nni.compressors.tfCompressor import AGPruner
+import tensorflow as tf
+from tensorflow.examples.tutorials.mnist import input_data
+
+
+def weight_variable(shape):
+    return tf.Variable(tf.truncated_normal(shape, stddev = 0.1))
+
+def bias_variable(shape):
+    return tf.Variable(tf.constant(0.1, shape = shape))
+
+def conv2d(x_input, w_matrix):
+    return tf.nn.conv2d(x_input, w_matrix, strides = [ 1, 1, 1, 1 ], padding = 'SAME')
+
+def max_pool(x_input, pool_size):
+    size = [ 1, pool_size, pool_size, 1 ]
+    return tf.nn.max_pool(x_input, ksize = size, strides = size, padding = 'SAME')
+
+
+class Mnist:
+    def __init__(self):
+        images = tf.placeholder(tf.float32, [ None, 784 ], name = 'input_x')
+        labels = tf.placeholder(tf.float32, [ None, 10 ], name = 'input_y')
+        keep_prob = tf.placeholder(tf.float32, name='keep_prob')
+
+        self.images = images
+        self.labels = labels
+        self.keep_prob = keep_prob
+
+        self.train_step = None
+        self.accuracy = None
+
+        self.w1 = None
+        self.b1 = None
+        self.fcw1 = None
+        self.cross = None
+        with tf.name_scope('reshape'):
+            x_image = tf.reshape(images, [ -1, 28, 28, 1 ])
+        with tf.name_scope('conv1'):
+            w_conv1 = weight_variable([ 5, 5, 1, 32 ])
+            self.w1 = w_conv1
+            b_conv1 = bias_variable([ 32 ])
+            self.b1 = b_conv1
+            h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
+        with tf.name_scope('pool1'):
+            h_pool1 = max_pool(h_conv1, 2)
+        with tf.name_scope('conv2'):
+            w_conv2 = weight_variable([ 5, 5, 32, 64 ])
+            b_conv2 = bias_variable([ 64 ])
+            h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
+        with tf.name_scope('pool2'):
+            h_pool2 = max_pool(h_conv2, 2)
+        with tf.name_scope('fc1'):
+            w_fc1 = weight_variable([ 7 * 7 * 64, 1024 ])
+            self.fcw1 = w_fc1
+            b_fc1 = bias_variable([ 1024 ])
+        h_pool2_flat = tf.reshape(h_pool2, [ -1, 7 * 7 * 64 ])
+        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
+        with tf.name_scope('dropout'):
+            h_fc1_drop = tf.nn.dropout(h_fc1, 0.5)
+        with tf.name_scope('fc2'):
+            w_fc2 = weight_variable([ 1024, 10 ])
+            b_fc2 = bias_variable([ 10 ])
+            y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2
+        with tf.name_scope('loss'):
+            cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = labels, logits = y_conv))
+            self.cross = cross_entropy
+        with tf.name_scope('adam_optimizer'):
+            self.train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)
+        with tf.name_scope('accuracy'):
+            correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(labels, 1))
+            self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
+
+
+def main():
+    tf.set_random_seed(0)
+
+    data = input_data.read_data_sets('data', one_hot = True)
+
+    model = Mnist()
+
+    '''you can change this to SensitivityPruner to implement it
+    pruner = SensitivityPruner(sparsity = 0.8)
+    '''
+    pruner = AGPruner(initial_sparsity=0, final_sparsity=0.8, start_epoch=1, end_epoch=10, frequency=1)
+    pruner.compress(tf.get_default_graph())
+
+
+    with tf.Session() as sess:
+        sess.run(tf.global_variables_initializer())
+        for batch_idx in range(2000):
+            batch = data.train.next_batch(2000)
+            model.train_step.run(feed_dict = {
+                model.images: batch[0],
+                model.labels: batch[1],
+                model.keep_prob: 0.5
+            })
+            if batch_idx % 10 == 0:
+                test_acc = model.accuracy.eval(feed_dict = {
+                    model.images: data.test.images,
+                    model.labels: data.test.labels,
+                    model.keep_prob: 1.0
+                })
+                pruner.update_epoch(batch_idx / 10,sess)
+                print('test accuracy', test_acc)
+
+
+        test_acc = model.accuracy.eval(feed_dict = {
+            model.images: data.test.images,
+            model.labels: data.test.labels,
+            model.keep_prob: 1.0
+        })
+        print('final result is', test_acc)
+
+main()