Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

add cifar10 example for NAS #1476

Merged
merged 36 commits into from
Aug 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
1adc820
ppo tuner
zhangql08hit Jul 22, 2019
b28fd20
pass simplified mnist-nas example
zhangql08hit Jul 23, 2019
be20b10
general ppo_tuner, is debugging
zhangql08hit Jul 27, 2019
8513c69
pass mnist-nas example, converge
zhangql08hit Jul 28, 2019
9806a50
remove unused files
zhangql08hit Jul 28, 2019
36f96f2
move the specified logic from tuner.py to ppo_tuner.py
zhangql08hit Jul 28, 2019
0da4d60
fix bug
zhangql08hit Jul 28, 2019
78796f3
remove unused function
zhangql08hit Jul 28, 2019
a51130f
remove useless comments and print
zhangql08hit Jul 28, 2019
7e3253e
fix python syntax error
zhangql08hit Jul 29, 2019
1b07b2c
add comments
zhangql08hit Jul 31, 2019
d6bdb10
add optional arguments
zhangql08hit Jul 31, 2019
f823d3f
add requirements
zhangql08hit Jul 31, 2019
a5fd738
support package install
zhangql08hit Jul 31, 2019
4584e54
update doc
zhangql08hit Jul 31, 2019
d8a40c6
support unified search space
zhangql08hit Aug 1, 2019
86dc53d
Merge branch 'master' of github.com:Microsoft/nni into dev-ppo-tuner
zhangql08hit Aug 1, 2019
ddc54e8
fix bug
zhangql08hit Aug 1, 2019
7131458
fix pylint in ppo_tuner.py
zhangql08hit Aug 2, 2019
98d234a
fix pylint in policy.py
zhangql08hit Aug 2, 2019
1410a39
fix pylint in util.py
zhangql08hit Aug 2, 2019
1bed65f
fix pylint in distri.py
zhangql08hit Aug 2, 2019
99de362
fix pylint in model.py
zhangql08hit Aug 2, 2019
bbcfef7
remove newlines
zhangql08hit Aug 2, 2019
c719f96
update doc
zhangql08hit Aug 2, 2019
7f72174
update doc
zhangql08hit Aug 2, 2019
65cc38c
fix bug
zhangql08hit Aug 5, 2019
5df3195
fix bug
zhangql08hit Aug 5, 2019
0770849
add one arg for ppotuner, add callback in msg_dispatcher
zhangql08hit Aug 5, 2019
b209e04
add fault tolerance to tolerate trial failure
zhangql08hit Aug 5, 2019
8d9d44a
fix bug
zhangql08hit Aug 5, 2019
5e78027
trivial change
zhangql08hit Aug 6, 2019
d3ebbb9
Merge branch 'master' of github.com:Microsoft/nni into dev-ppo-tuner
zhangql08hit Aug 6, 2019
1293c09
add nas cifar10 example
zhangql08hit Aug 19, 2019
c68a108
Merge branch 'dev-nas-tuner' of github.com:Microsoft/nni into dev-ppo…
zhangql08hit Aug 19, 2019
6821d9c
update doc
zhangql08hit Aug 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion examples/trials/nas_cifar10/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@
===

Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.

We have included its trial code in this folder, and provided example config files to show how to use PPO tuner to tune the trial code.

> Download data

- `cd data && . download.sh`
- `tar xzf cifar-10-python.tar.gz && mv cifar-batches cifar10`

Thanks our lovely contributors.

And welcome more and more people to join us!
And welcome more and more people to join us!
31 changes: 31 additions & 0 deletions examples/trials/nas_cifar10/config_pai_ppo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
authorName: Unknown
experimentName: enas_macro
trialConcurrency: 20
maxExecDuration: 2400h
maxTrialNum: 20000
#choice: local, remote
trainingServicePlatform: pai
#choice: true, false
useAnnotation: true
multiPhase: false
versionCheck: false
nniManagerIp: 0.0.0.0
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
trials_per_update: 60
epochs_per_update: 20
minibatch_size: 6
trial:
command: sh ./macro_cifar10_pai.sh
codeDir: ./
gpuNum: 1
cpuNum: 1
memoryMB: 8196
image: msranni/nni:latest
virtualCluster: nni
paiConfig:
userName: your_account
passWord: your_pwd
host: 0.0.0.0
21 changes: 21 additions & 0 deletions examples/trials/nas_cifar10/config_ppo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
authorName: Unknown
experimentName: enas_macro
trialConcurrency: 4
maxExecDuration: 2400h
maxTrialNum: 20000
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
multiPhase: false
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
trials_per_update: 60
epochs_per_update: 12
minibatch_size: 10
trial:
command: sh ./macro_cifar10.sh
codeDir: ./
gpuNum: 1
1 change: 1 addition & 0 deletions examples/trials/nas_cifar10/data/download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
35 changes: 35 additions & 0 deletions examples/trials/nas_cifar10/macro_cifar10.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
set -e
export PYTHONPATH="$(pwd)"

python3 src/cifar10/nni_child_cifar10.py \
--data_format="NCHW" \
--search_for="macro" \
--reset_output_dir \
--data_path="data/cifar10" \
--output_dir="outputs" \
--train_data_size=45000 \
--batch_size=100 \
--num_epochs=8 \
--log_every=50 \
--eval_every_epochs=1 \
--child_use_aux_heads \
--child_num_layers=12 \
--child_out_filters=36 \
--child_l2_reg=0.0002 \
--child_num_branches=6 \
--child_num_cell_layers=5 \
--child_keep_prob=0.50 \
--child_drop_path_keep_prob=0.60 \
--child_lr_cosine \
--child_lr_max=0.05 \
--child_lr_min=0.001 \
--child_lr_T_0=10 \
--child_lr_T_mul=2 \
--controller_search_whole_channels \
--controller_train_every=1 \
--controller_num_aggregate=20 \
--controller_train_steps=50 \
--child_mode="subgraph" \
"$@"

35 changes: 35 additions & 0 deletions examples/trials/nas_cifar10/macro_cifar10_pai.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
set -e
export PYTHONPATH="$(pwd)"

python3 src/cifar10/nni_child_cifar10.py \
--data_format="NCHW" \
--search_for="macro" \
--reset_output_dir \
--data_path="data/cifar10" \
--output_dir="outputs" \
--train_data_size=45000 \
--batch_size=100 \
--num_epochs=30 \
--log_every=50 \
--eval_every_epochs=1 \
--child_use_aux_heads \
--child_num_layers=12 \
--child_out_filters=36 \
--child_l2_reg=0.0002 \
--child_num_branches=6 \
--child_num_cell_layers=5 \
--child_keep_prob=0.50 \
--child_drop_path_keep_prob=0.60 \
--child_lr_cosine \
--child_lr_max=0.05 \
--child_lr_min=0.001 \
--child_lr_T_0=10 \
--child_lr_T_mul=2 \
--controller_search_whole_channels \
--controller_train_every=1 \
--controller_num_aggregate=20 \
--controller_train_steps=50 \
--child_mode="subgraph" \
"$@"

Empty file.
Empty file.
74 changes: 74 additions & 0 deletions examples/trials/nas_cifar10/src/cifar10/data_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import os
import sys
import pickle
import numpy as np
import tensorflow as tf


def _read_data(data_path, train_files):
"""Reads CIFAR-10 format data. Always returns NHWC format.

Returns:
images: np tensor of size [N, H, W, C]
labels: np tensor of size [N]
"""
images, labels = [], []
for file_name in train_files:
print(file_name)
full_name = os.path.join(data_path, file_name)
with open(full_name, "rb") as finp:
data = pickle.load(finp, encoding='latin1')
batch_images = data["data"].astype(np.float32) / 255.0
batch_labels = np.array(data["labels"], dtype=np.int32)
images.append(batch_images)
labels.append(batch_labels)
images = np.concatenate(images, axis=0)
labels = np.concatenate(labels, axis=0)
images = np.reshape(images, [-1, 3, 32, 32])
images = np.transpose(images, [0, 2, 3, 1])

return images, labels


def read_data(data_path, num_valids=5000):
print("-" * 80)
print("Reading data")

images, labels = {}, {}

train_files = [
"data_batch_1",
"data_batch_2",
"data_batch_3",
"data_batch_4",
"data_batch_5",
]
test_file = [
"test_batch",
]
images["train"], labels["train"] = _read_data(data_path, train_files)

if num_valids:
images["valid"] = images["train"][-num_valids:]
labels["valid"] = labels["train"][-num_valids:]

images["train"] = images["train"][:-num_valids]
labels["train"] = labels["train"][:-num_valids]
else:
images["valid"], labels["valid"] = None, None

images["test"], labels["test"] = _read_data(data_path, test_file)

print("Prepropcess: [subtract mean], [divide std]")
mean = np.mean(images["train"], axis=(0, 1, 2), keepdims=True)
std = np.std(images["train"], axis=(0, 1, 2), keepdims=True)

print("mean: {}".format(np.reshape(mean * 255.0, [-1])))
print("std: {}".format(np.reshape(std * 255.0, [-1])))

images["train"] = (images["train"] - mean) / std
if num_valids:
images["valid"] = (images["valid"] - mean) / std
images["test"] = (images["test"] - mean) / std

return images, labels
Loading