Nano: light weight hpo support in nano tensorflow #3712

shane-huang · 2021-12-13T10:23:12Z

Overview

Refer to hpo design in nano pytroch in issue #3925

Enable hyper param tuning in nano tensorflow. Make it as transparent as possible to users. General principals include:

provide some bkm default settings if no search space is explicitly provided by user. (e.g. learning rate, batch size, etc.)
frequently used metrics don't need extra configurations for hpo. e.g. mse -> minimize objective, etc.

API Design

`model.search`

model = nano.tf.keras.Sequential([...])
...
model.search(sub_x, sub_y, val_data=(subx_val, suby_val), n_trials, epochs=20, ...) # <--- add 1 line code
#model.search(...,resume=True,...)  # <---- tune can be called several time to resume from where it is left.  

model.fit(train_x, train_y, epochs=100, use_tune_id=trial_id, ...)

search does not return or save any tuned model, it just collects the statistics of all the trials. So fit is still needed after search. (Notes: maybe we can provide the option to save checkpoints so that fit just get the best checkpoint w/o training)
search automatically search for training related hparams, e.g., batch size, learning rate, etc. The search space is by default inferred and adapted from the input and environment. It can also be explicitly specified in arguments, as the exmaple shown below.

model.search(..., n_trials=100, batch_size=automl.space.Categorical([32,64]), learning_rate=automl.space.Real(0.001,0.1), ...)

search can be called several times in order to resume the tuning.
when tuning finished, user can explicitly call end_search and specify the parameters in which trial will be used for the following fit. If trial id is not specified in fit, the best trial is selected by default. If end_search is not explicit called, it will be called in fit.
We assume that it is common to use a subset of data in search, while in fit the user can use the full datast (train+val used in search), or use a larger dataset, or more epochs.

`model.search_summary`

model = nano.tf.keras.Sequential([...])
...
model.search(...) 
hpo_summary_df = model.search_summary()
print(df)

search_summary returns a data frame containing the hparams used in each trial and the value of target metric (and possibly speed or other attributes). Can serve as some sort of a leaderboard.
tune summaries can also be visualized, using tools provided by optuna

Usage

Basic usage W/O search space configurations:

Just add 1 line of code search before fit without modification to original code. learning rate, batch size will be automatically searched. If search space is not explicitly set, the search space will be inferred automatically.

model = bigdl.nano.tf.keras.Sequential([...])
... add layers to model
model.compile(...)
model.search(sub_x, sub_y, n_trials, epochs=20, ...) # <--- add 1 line code
model.fit(x_train, y_train, epochs=100, use_tune=trial_id, ...)

Advanced Usage:

Case 1: Use Sequential to define a Searchable Model

# define the model as usual (just change the name space to nano.tf.keras instead of tf.keras for layers, Sequential. You can now use search space specificaions for Dense arguments

import bigdl.nano.automl.space as space
model = nano.tf.keras.Sequential()
        .add(nano.keras.layers.Flatten(input_shape=(28, 28))
        .add(nano.keras.layers.Dense(units=space.Int(64,256), 
                              activation=space.Categorical(['relu', 'linear'])))
        add(nano.keras.kayers.Dense(units=space.Int(10,20), activation='softmax'))
model.compile()
model.search()
model.fit()

Case 2: Use Functional API to define a Searable Model

import bigdl.nano.automl.hpo.space as space

inputs = nano.tf.keras.Input(input_shape=(28, 28))
x=nano.tf.keras.layers.Flatten()(inputs)
x=nano.tf.keras.layers.Dense(units=space.Int(64,256), , activation=space.Categorical(['relu', 'linear'])(x)
outputs=nano.tf.keras.layers.Dense(units=space.Int(10,20), activation='softmax')(x)
model = nano.tf.keras.Model(inputs, outputs) 
model.compile()
model.search()
model.fit()

Case 3: Ruse a Pre-defined model using customized model

import bigdl.nano.automl as automl
import bigdl.nano.automl.hpo.space as space

#layers = [1, 1, 1]
#channels = [16, 16, 32, 64]
#net = CIFARResNetV1(CIFARBasicBlockV1, layers, channels)

@automl.model()
class MyCifarResNet(CIFARResNetV1):
    def __init__(self, nstage1, nstage2):
        nstage3 = 9 - nstage1 - nstage2
        layers = [nstage1, nstage2, nstage3]
        channels = [16, 16, 32, 64]
        super().__init__(CIFARBasicBlockV1, layers=layers, channels=channels)

model=MyCifarResNet(nstage1=space.Int(2, 4),nstage2=space.Int(2,4)))

model.compile()
model.search()
model.fit()

Additional Notes

seems pytroch lightening has a tune method, might need to rename the tune method.
only supports local db (sqlite) will be easier.
we can allow visualizaiton of hyper parameter tuning using jupyter notebook. That is useful and support is straightforward.
optimization direction needs to be inferred for frequently used keras metrics such as accuracy and mse.

@jason-dai

The text was updated successfully, but these errors were encountered:

shane-huang · 2021-12-13T10:44:01Z

nano pytorch hpo support will be described in another Issue.

shane-huang · 2021-12-13T11:27:51Z

Implementation Notes.

The Objective to optimize

class Objective(object):
   def __init__(self, keras_model, model_creator, model_compile, **fit_kwargs):
       # make a copy of original model so that next trial can start fresh. 
       
        ...
   def __call__ (trial):
    # the objective function for each trial
      if self.keras_model is None:
         self.keras_model = self.model_creator(trial)
      if self.model_compile not defined:
         keras_model.compile(...)
      else:
         self.model_compile(trial, keras_model)
      new_fit_args = ... # replace hp args with trial.suggestXXX
      target_metric=... # validate the metric settings (e.g. use the first metric if >1 metrics specified) 
      hist = self.keras_model.fit(**new_fit_args)
      score = max(hist.history[target_metric])
      return score

Changes in bigdl.nano.tf.keras.Sequential and bigdl.nano.tf.keras.Model

class bigdl.nano.tf.keras.Sequential:
   ...
   def tune(...,**tune_args,**fit_kwargs):
      # determine direction based on common metrics supported
      objective=Objective(keras_model, model_creator, model_comple, **fit_kwargs)
      self.study = optuna.create_study(direction=direction)
      self.study.optimize(objective, n_trials=100)
      trial = study.best_trial
      print_trial(trial)
      
   def fit(...,use_tune=...):
      if use tune:
         fit with params using trial_id 
      else: 
         original fit routine

shane-huang · 2022-01-11T03:00:11Z

Implementation Notes 2

@automl.obj() will return a lazy intialilzable object with configuration space as a member variable.

For keras.Sequential, the keras layers are decorated, and the added layers in model.add() will be stored in a list in our Sequential object. The search spaces is collected from the decorated layers and converted to optuna search space.
For keras.Model, the keras Model object is decorated as a whole, then our Model instance will be created using a utility function from keras Model. The search space will be extracted from Model and converted to optuna serach space.
Reference: https://auto.gluon.ai/stable/tutorials/course/object.html

shane-huang · 2022-01-11T03:01:29Z

@jason-dai revised design

jason-dai · 2022-01-18T09:12:37Z

Is it possible to support the following API?

model = nano.keras.Sequential()
        .add(nano.keras.layers.Dense(…))
        .add(…)
        .add(nano.keras.layers.Softmax(…))
model.compile()
model.tune()
model.fit()

input = nano.keras.layers.Input(…)
dense = nano.keras.layers.Dense(…)
…
output = nano.keras.layers.Softmax(…)
model = nano.keras.Model(input, output, …) 
model.compile()
model.tune()
model.fit()

@nano.automl
class MyCifarResNet(CIFARResNetV1):
    def __init__(self, nstage1, nstage2):
        nstage3 = 9 - nstage1 - nstage2
        layers = [nstage1, nstage2, nstage3]
        channels = [16, 16, 32, 64]
        super().__init__(CIFARBasicBlockV1, layers=layers, channels=channels)

model= MyCifarResNet(nstage1=space.Int(2, 4),nstage2=space.Int(2,4))
model.compile()
model.tune()
model.fit()

shane-huang · 2022-01-20T08:10:36Z

I think we can support this API.
Essentially,

nono.keras.layers, optmizers are keras objects that were automatically decorated. On instantiation they were replaced by an object AutoObject, which has a sample() method that takes a configuration and does actual instantiation.
user customized model (e.g. MyCifarResNet) is decorated using @nano.automodel. On instantiation it was replaced by an AutoModel (or its inheritant).
AutoModel implements search, search_summary. Search will run many trials, and in each trial one set of configuration is applied and a model is instantiated, compiled, fitted and evaluated. When search finished a best configuration is obtained. AutoModel holds a keras model internally, which is actually built and compiled using the best config at model.fit.
Sequential and Model are all (inheritants of ) AutoModels. To instiate the actual keras model, Model/Sequential need to traverse the AutoObjects graph and sample each AutoObject using some hparam configuration. Customized models are AutoModels which just contain a single AutoObject, and its instantiation is straightforward.

jason-dai · 2022-01-20T08:36:50Z

Use search instead of tune?

shane-huang · 2022-01-22T14:14:00Z

Use search instead of tune?

Updated the above design according to your comments.

In addition, we need to consider the case when people don't want to use AutoML at all. Enabling AutoML will automatically decorate all the nano.tf.keras layers and optimizers, there're might be some overhead (construction of AutoObjects, graph traversing) or potential issues (e.g. delayed construction of the keras model inside AutoModel breaks eager exec, etc.)

To give people an option of whether to enable automl at all, there're two ways
- Option 1 : a global option for nano. This global option can control whether we get original keras layers or decorated layers when user creates nano.tf.keras.layers, later we can also extend it to support optimized implementation of keras layers, etc.
```
 import bigdl.nano
 bigdl.nano.enable_automl=True 
```
- Option 2: use another namespace,
```
 # w/ automl
 nano.tf.automl.keras.Sequential 
 # w/o automl
 nano.tf.keras.Sequential 
```

The first option looks more natural to me.

jason-dai · 2022-01-24T02:30:22Z

Use search instead of tune?

Updated the above design according to your comments.

In addition, we need to consider the case when people don't want to use AutoML at all. Enabling AutoML will automatically decorate all the nano.tf.keras layers and optimizers, there're might be some overhead (construction of AutoObjects, graph traversing) or potential issues (e.g. delayed construction of the keras model inside AutoModel breaks eager exec, etc.)
To give people an option of whether to enable automl at all, there're two ways

Option 1 : a global option for nano. This global option can control whether we get original keras layers or decorated layers when user creates nano.tf.keras.layers, later we can also extend it to support optimized implementation of keras layers, etc.
 import bigdl.nano
 bigdl.nano.enable_automl=True 
Option 2: use another namespace,
 # w/ automl
 nano.tf.automl.keras.Sequential 
 # w/o automl
 nano.tf.keras.Sequential 
The first option looks more natural to me.

Maybe we can check the parameters when the user constructs the layer/model; if no search space is specified, we may directly create a Keras layer/model?

shane-huang · 2022-01-24T12:42:05Z

Maybe we can check the parameters when the user constructs the layer/model; if no search space is specified, we may directly create a Keras layer/model?

Short answer is yes, we can. though the code may not be very clean and still there're some extra operations in detecting whether search space is used in arguments.

Detailed answer:
There're essentially 2 options to implement nano.tf.keras.Sequential/Model.

option 1: inheritance. shown as below. All methods of keras.Model can be called on nano.tf.keras.Model as well withouth extra work.

 class nano.tf.keras.Model(tf.keras.Model):
       def __init__(self, ...):
                 super().__init__(...)

option 2: composition. shown as below, methods of tf.keras.Model have to be explictily exposed from internal model.

class nano.tf.keras.Model(object):
      def __init__(self, ...):
             self._internal_m : tf.keras.Model = ...
      def fit(...):
            ...
            return self._internal_m.fit(...)
     def compile()
     ....

In case of user don't use automl at all, using the inheritance way seems the proper option. But for automl, the composition option is more convinient and less error prone (we should assusme the automl Model behavior is not exactly the same as original keras model, for example, a user should be able to inspect the model right after Model.compile or build before fitting, while he can't inpect a automodel before fit)

A global option of disabling automl can change the two implementation easily. Anyway if we don't want it, we can use a mixture of two, i.e. super() keeps the same behavior as keras and internal model is used for automl if search space if found.

yangw1234 · 2022-02-15T02:36:21Z

Looks like keras also provides a tuner: https://www.tensorflow.org/tutorials/keras/keras_tuner

shane-huang · 2022-06-16T10:13:46Z

The implementation is merged so close it. Later may reopen if new features are added.

shane-huang self-assigned this Dec 14, 2021

shane-huang mentioned this issue Dec 23, 2021

[ProtoType] Nano: add hpo support for nano tf and pytorch #3791

Closed

shane-huang changed the title ~~light weight hpo support in nano tensorflow~~ Nano: light weight hpo support in nano tensorflow Dec 28, 2021

shane-huang mentioned this issue Jan 25, 2022

Nano: light weight hpo support in nano pytorch #3925

Closed

shane-huang closed this as completed Jun 16, 2022

shane-huang mentioned this issue Jun 20, 2022

Nano HPO Roadmap #4890

Open

73 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nano: light weight hpo support in nano tensorflow #3712

Nano: light weight hpo support in nano tensorflow #3712

shane-huang commented Dec 13, 2021 •

edited

Loading

shane-huang commented Dec 13, 2021

shane-huang commented Dec 13, 2021 •

edited

Loading

shane-huang commented Jan 11, 2022 •

edited

Loading

shane-huang commented Jan 11, 2022

jason-dai commented Jan 18, 2022

shane-huang commented Jan 20, 2022 •

edited

Loading

jason-dai commented Jan 20, 2022

shane-huang commented Jan 22, 2022 •

edited

Loading

jason-dai commented Jan 24, 2022

shane-huang commented Jan 24, 2022

yangw1234 commented Feb 15, 2022

shane-huang commented Jun 16, 2022

Nano: light weight hpo support in nano tensorflow #3712

Nano: light weight hpo support in nano tensorflow #3712

Comments

shane-huang commented Dec 13, 2021 • edited Loading

Overview

API Design

model.search

model.search_summary

Usage

Basic usage W/O search space configurations:

Advanced Usage:

Case 1: Use Sequential to define a Searchable Model

Case 2: Use Functional API to define a Searable Model

Case 3: Ruse a Pre-defined model using customized model

Additional Notes

shane-huang commented Dec 13, 2021

shane-huang commented Dec 13, 2021 • edited Loading

Implementation Notes.

shane-huang commented Jan 11, 2022 • edited Loading

Implementation Notes 2

shane-huang commented Jan 11, 2022

jason-dai commented Jan 18, 2022

shane-huang commented Jan 20, 2022 • edited Loading

jason-dai commented Jan 20, 2022

shane-huang commented Jan 22, 2022 • edited Loading

jason-dai commented Jan 24, 2022

shane-huang commented Jan 24, 2022

yangw1234 commented Feb 15, 2022

shane-huang commented Jun 16, 2022

shane-huang commented Dec 13, 2021 •

edited

Loading

`model.search`

`model.search_summary`

shane-huang commented Dec 13, 2021 •

edited

Loading

shane-huang commented Jan 11, 2022 •

edited

Loading

shane-huang commented Jan 20, 2022 •

edited

Loading

shane-huang commented Jan 22, 2022 •

edited

Loading