Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nano: light weight hpo support in nano tensorflow #3712

Closed
shane-huang opened this issue Dec 13, 2021 · 12 comments
Closed

Nano: light weight hpo support in nano tensorflow #3712

shane-huang opened this issue Dec 13, 2021 · 12 comments
Assignees

Comments

@shane-huang
Copy link
Contributor

shane-huang commented Dec 13, 2021

Overview

Refer to hpo design in nano pytroch in issue #3925

Enable hyper param tuning in nano tensorflow. Make it as transparent as possible to users. General principals include:

  • provide some bkm default settings if no search space is explicitly provided by user. (e.g. learning rate, batch size, etc.)
  • frequently used metrics don't need extra configurations for hpo. e.g. mse -> minimize objective, etc.

API Design

model.search

model = nano.tf.keras.Sequential([...])
...
model.search(sub_x, sub_y, val_data=(subx_val, suby_val), n_trials, epochs=20, ...) # <--- add 1 line code
#model.search(...,resume=True,...)  # <---- tune can be called several time to resume from where it is left.  

model.fit(train_x, train_y, epochs=100, use_tune_id=trial_id, ...) 
  • search does not return or save any tuned model, it just collects the statistics of all the trials. So fit is still needed after search. (Notes: maybe we can provide the option to save checkpoints so that fit just get the best checkpoint w/o training)
  • search automatically search for training related hparams, e.g., batch size, learning rate, etc. The search space is by default inferred and adapted from the input and environment. It can also be explicitly specified in arguments, as the exmaple shown below.
model.search(..., n_trials=100, batch_size=automl.space.Categorical([32,64]), learning_rate=automl.space.Real(0.001,0.1), ...) 
  • search can be called several times in order to resume the tuning.
  • when tuning finished, user can explicitly call end_search and specify the parameters in which trial will be used for the following fit. If trial id is not specified in fit, the best trial is selected by default. If end_search is not explicit called, it will be called in fit.
  • We assume that it is common to use a subset of data in search, while in fit the user can use the full datast (train+val used in search), or use a larger dataset, or more epochs.

model.search_summary

model = nano.tf.keras.Sequential([...])
...
model.search(...) 
hpo_summary_df = model.search_summary()
print(df)
  • search_summary returns a data frame containing the hparams used in each trial and the value of target metric (and possibly speed or other attributes). Can serve as some sort of a leaderboard.
  • tune summaries can also be visualized, using tools provided by optuna

Usage

Basic usage W/O search space configurations:

Just add 1 line of code search before fit without modification to original code. learning rate, batch size will be automatically searched. If search space is not explicitly set, the search space will be inferred automatically.

model = bigdl.nano.tf.keras.Sequential([...])
... add layers to model
model.compile(...)
model.search(sub_x, sub_y, n_trials, epochs=20, ...) # <--- add 1 line code
model.fit(x_train, y_train, epochs=100, use_tune=trial_id, ...) 

Advanced Usage:

Case 1: Use Sequential to define a Searchable Model

# define the model as usual (just change the name space to nano.tf.keras instead of tf.keras for layers, Sequential. You can now use search space specificaions for Dense arguments

import bigdl.nano.automl.space as space
model = nano.tf.keras.Sequential()
        .add(nano.keras.layers.Flatten(input_shape=(28, 28))
        .add(nano.keras.layers.Dense(units=space.Int(64,256), 
                              activation=space.Categorical(['relu', 'linear'])))
        add(nano.keras.kayers.Dense(units=space.Int(10,20), activation='softmax'))
model.compile()
model.search()
model.fit()

Case 2: Use Functional API to define a Searable Model

import bigdl.nano.automl.hpo.space as space

inputs = nano.tf.keras.Input(input_shape=(28, 28))
x=nano.tf.keras.layers.Flatten()(inputs)
x=nano.tf.keras.layers.Dense(units=space.Int(64,256), , activation=space.Categorical(['relu', 'linear'])(x)
outputs=nano.tf.keras.layers.Dense(units=space.Int(10,20), activation='softmax')(x)
model = nano.tf.keras.Model(inputs, outputs) 
model.compile()
model.search()
model.fit()

Case 3: Ruse a Pre-defined model using customized model

import bigdl.nano.automl as automl
import bigdl.nano.automl.hpo.space as space

#layers = [1, 1, 1]
#channels = [16, 16, 32, 64]
#net = CIFARResNetV1(CIFARBasicBlockV1, layers, channels)

@automl.model()
class MyCifarResNet(CIFARResNetV1):
    def __init__(self, nstage1, nstage2):
        nstage3 = 9 - nstage1 - nstage2
        layers = [nstage1, nstage2, nstage3]
        channels = [16, 16, 32, 64]
        super().__init__(CIFARBasicBlockV1, layers=layers, channels=channels)

model=MyCifarResNet(nstage1=space.Int(2, 4),nstage2=space.Int(2,4)))

model.compile()
model.search()
model.fit()

Additional Notes

  • seems pytroch lightening has a tune method, might need to rename the tune method.
  • only supports local db (sqlite) will be easier.
  • we can allow visualizaiton of hyper parameter tuning using jupyter notebook. That is useful and support is straightforward.
  • optimization direction needs to be inferred for frequently used keras metrics such as accuracy and mse.

@jason-dai

@shane-huang
Copy link
Contributor Author

nano pytorch hpo support will be described in another Issue.

@shane-huang
Copy link
Contributor Author

shane-huang commented Dec 13, 2021

Implementation Notes.

The Objective to optimize

class Objective(object):
   def __init__(self, keras_model, model_creator, model_compile, **fit_kwargs):
       # make a copy of original model so that next trial can start fresh. 
       
        ...
   def __call__ (trial):
    # the objective function for each trial
      if self.keras_model is None:
         self.keras_model = self.model_creator(trial)
      if self.model_compile not defined:
         keras_model.compile(...)
      else:
         self.model_compile(trial, keras_model)
      new_fit_args = ... # replace hp args with trial.suggestXXX
      target_metric=... # validate the metric settings (e.g. use the first metric if >1 metrics specified) 
      hist = self.keras_model.fit(**new_fit_args)
      score = max(hist.history[target_metric])
      return score

Changes in bigdl.nano.tf.keras.Sequential and bigdl.nano.tf.keras.Model

class bigdl.nano.tf.keras.Sequential:
   ...
   def tune(...,**tune_args,**fit_kwargs):
      # determine direction based on common metrics supported
      objective=Objective(keras_model, model_creator, model_comple, **fit_kwargs)
      self.study = optuna.create_study(direction=direction)
      self.study.optimize(objective, n_trials=100)
      trial = study.best_trial
      print_trial(trial)
      
   def fit(...,use_tune=...):
      if use tune:
         fit with params using trial_id 
      else: 
         original fit routine

@shane-huang shane-huang self-assigned this Dec 14, 2021
@shane-huang shane-huang changed the title light weight hpo support in nano tensorflow Nano: light weight hpo support in nano tensorflow Dec 28, 2021
@shane-huang
Copy link
Contributor Author

shane-huang commented Jan 11, 2022

Implementation Notes 2

@automl.obj() will return a lazy intialilzable object with configuration space as a member variable.

  • For keras.Sequential, the keras layers are decorated, and the added layers in model.add() will be stored in a list in our Sequential object. The search spaces is collected from the decorated layers and converted to optuna search space.
  • For keras.Model, the keras Model object is decorated as a whole, then our Model instance will be created using a utility function from keras Model. The search space will be extracted from Model and converted to optuna serach space.
    Reference: https://auto.gluon.ai/stable/tutorials/course/object.html

@shane-huang
Copy link
Contributor Author

@jason-dai revised design

@jason-dai
Copy link
Contributor

Is it possible to support the following API?

model = nano.keras.Sequential()
        .add(nano.keras.layers.Dense(…))
        .add(…)
        .add(nano.keras.layers.Softmax(…))
model.compile()
model.tune()
model.fit()
input = nano.keras.layers.Input(…)
dense = nano.keras.layers.Dense(…)
…
output = nano.keras.layers.Softmax(…)
model = nano.keras.Model(input, output, …) 
model.compile()
model.tune()
model.fit()
@nano.automl
class MyCifarResNet(CIFARResNetV1):
    def __init__(self, nstage1, nstage2):
        nstage3 = 9 - nstage1 - nstage2
        layers = [nstage1, nstage2, nstage3]
        channels = [16, 16, 32, 64]
        super().__init__(CIFARBasicBlockV1, layers=layers, channels=channels)

model= MyCifarResNet(nstage1=space.Int(2, 4),nstage2=space.Int(2,4))
model.compile()
model.tune()
model.fit()

@shane-huang
Copy link
Contributor Author

shane-huang commented Jan 20, 2022

I think we can support this API.
Essentially,

  • nono.keras.layers, optmizers are keras objects that were automatically decorated. On instantiation they were replaced by an object AutoObject, which has a sample() method that takes a configuration and does actual instantiation.
  • user customized model (e.g. MyCifarResNet) is decorated using @nano.automodel. On instantiation it was replaced by an AutoModel (or its inheritant).
  • AutoModel implements search, search_summary. Search will run many trials, and in each trial one set of configuration is applied and a model is instantiated, compiled, fitted and evaluated. When search finished a best configuration is obtained. AutoModel holds a keras model internally, which is actually built and compiled using the best config at model.fit.
  • Sequential and Model are all (inheritants of ) AutoModels. To instiate the actual keras model, Model/Sequential need to traverse the AutoObjects graph and sample each AutoObject using some hparam configuration. Customized models are AutoModels which just contain a single AutoObject, and its instantiation is straightforward.

@jason-dai
Copy link
Contributor

Use search instead of tune?

@shane-huang
Copy link
Contributor Author

shane-huang commented Jan 22, 2022

Use search instead of tune?

Updated the above design according to your comments.

In addition, we need to consider the case when people don't want to use AutoML at all. Enabling AutoML will automatically decorate all the nano.tf.keras layers and optimizers, there're might be some overhead (construction of AutoObjects, graph traversing) or potential issues (e.g. delayed construction of the keras model inside AutoModel breaks eager exec, etc.)

  • To give people an option of whether to enable automl at all, there're two ways

    • Option 1 : a global option for nano. This global option can control whether we get original keras layers or decorated layers when user creates nano.tf.keras.layers, later we can also extend it to support optimized implementation of keras layers, etc.
     import bigdl.nano
     bigdl.nano.enable_automl=True 
    • Option 2: use another namespace,
     # w/ automl
     nano.tf.automl.keras.Sequential 
     # w/o automl
     nano.tf.keras.Sequential 

The first option looks more natural to me.

@jason-dai
Copy link
Contributor

Use search instead of tune?

Updated the above design according to your comments.

In addition, we need to consider the case when people don't want to use AutoML at all. Enabling AutoML will automatically decorate all the nano.tf.keras layers and optimizers, there're might be some overhead (construction of AutoObjects, graph traversing) or potential issues (e.g. delayed construction of the keras model inside AutoModel breaks eager exec, etc.)

  • To give people an option of whether to enable automl at all, there're two ways

    • Option 1 : a global option for nano. This global option can control whether we get original keras layers or decorated layers when user creates nano.tf.keras.layers, later we can also extend it to support optimized implementation of keras layers, etc.
     import bigdl.nano
     bigdl.nano.enable_automl=True 
    • Option 2: use another namespace,
     # w/ automl
     nano.tf.automl.keras.Sequential 
     # w/o automl
     nano.tf.keras.Sequential 

The first option looks more natural to me.

Maybe we can check the parameters when the user constructs the layer/model; if no search space is specified, we may directly create a Keras layer/model?

@shane-huang
Copy link
Contributor Author

Maybe we can check the parameters when the user constructs the layer/model; if no search space is specified, we may directly create a Keras layer/model?

Short answer is yes, we can. though the code may not be very clean and still there're some extra operations in detecting whether search space is used in arguments.

Detailed answer:
There're essentially 2 options to implement nano.tf.keras.Sequential/Model.

  • option 1: inheritance. shown as below. All methods of keras.Model can be called on nano.tf.keras.Model as well withouth extra work.
 class nano.tf.keras.Model(tf.keras.Model):
       def __init__(self, ...):
                 super().__init__(...)
  • option 2: composition. shown as below, methods of tf.keras.Model have to be explictily exposed from internal model.
class nano.tf.keras.Model(object):
      def __init__(self, ...):
             self._internal_m : tf.keras.Model = ...
      def fit(...):
            ...
            return self._internal_m.fit(...)
     def compile()
     ....

In case of user don't use automl at all, using the inheritance way seems the proper option. But for automl, the composition option is more convinient and less error prone (we should assusme the automl Model behavior is not exactly the same as original keras model, for example, a user should be able to inspect the model right after Model.compile or build before fitting, while he can't inpect a automodel before fit)

A global option of disabling automl can change the two implementation easily. Anyway if we don't want it, we can use a mixture of two, i.e. super() keeps the same behavior as keras and internal model is used for automl if search space if found.

@yangw1234
Copy link
Contributor

Looks like keras also provides a tuner: https://www.tensorflow.org/tutorials/keras/keras_tuner

@shane-huang
Copy link
Contributor Author

The implementation is merged so close it. Later may reopen if new features are added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants