Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter routing #50

Closed
stsievert opened this issue Aug 13, 2020 · 29 comments · Fixed by #67
Closed

Parameter routing #50

stsievert opened this issue Aug 13, 2020 · 29 comments · Fixed by #67

Comments

@stsievert
Copy link
Collaborator

stsievert commented Aug 13, 2020

As per #37 (comment), it might be nice to have some parameter routing and/or renaming build_fn. I'd support this interface:

def build_keras_model(hidden_dim=10, activation="sigmoid"):
    ...
    return model

est = KerasClassifier(
    model=build_keras_model,
    model__hidden_dim=20,
    model__activation="relu",
    batch_size=256,
    validation_frac=0.2,
)

This mirrors the interface that Skorch has. They support overriding some keywords like optimizer__lr with lr, or iteratior_valid__batch_size and iterator_train__batch_size with batch_size: https://skorch.readthedocs.io/en/stable/user/neuralnet.html#batch-size

@stsievert
Copy link
Collaborator Author

stsievert commented Aug 13, 2020

It'd be nice if the model.compile didn't happen in build_fn, or if an uncompiled-model was passed to build_fn (which I think should be renamed to model). Then I think this interface would be possible:

def build_keras_model(hidden_dim=10, activation="sigmoid"):
    ...
    return model  # uncompiled model; compile happens inside BaseWrapper?

est = KerasClassifier(
    model=build_keras_model,
    model__hidden_dim=20,
    model__activation="relu",
    optimizer="sgd",
    optimizer__lr=0.01,
    optimizer__momentum=0.9,
    loss="categorical_crossentropy",
    batch_size=256,
    validation_frac=0.2,
)

This almost exactly mirrors the Skorch API.

@adriangb
Copy link
Owner

renaming build_fn

I agree that build_fn is not the best name, but renaming it would break the old API. I don't think the tradeoff of a nicer name is worth breaking the old API, at least not without a deprecation period.

They support overriding some keywords like optimizer__lr with lr, or iteratior_valid__batch_size and iterator_train__batch_size with batch_size: https://skorch.readthedocs.io/en/stable/user/neuralnet.html#batch-size

To be clear (if I understood their docs correctly), iterator_train__batch_size will overwrite batch_size, right?

@adriangb
Copy link
Owner

adriangb commented Aug 13, 2020

It'd be nice if the model.compile didn't happen in build_fn, or if an uncompiled-model was passed to build_fn

I think it would be nice to have that flexibility. But I also want to support the existing interface (i.e. compile inside build_fn). I think it's possible to compile a model multiple times, so maybe we can add a way to compile (if build_fn returned an uncompiled model) or re-compile (if build_fn returned a compiled model, or if build_fn was a compiled keas.Model instance)? Maybe we can leave this for another effort (separate from an initial parameter routing implementation, unless you think this might somehow change how parameter routing is going to work?)

@adriangb
Copy link
Owner

Copied from #47:

A quite common pattern in scikit-learn estimators is to parse/modify an __init__ param in fit, ex:

def fit(X, y, ...):
    self._random_state = check_random_state(self.random_state)

(copied from https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/neural_network/_multilayer_perceptron.py#L339)

Where downstream the estimator will only use the self._random_state param and not self.random_state.

I think it would be nice to allow users to do something like:

class MyClassifier(KerasClassifier):

    def hook_params(self):   # in base class this would just be a pass
         if self.batch_size is None:
              self.batch_size_ = 32
         else:
              self.batch_size_ = self.batch_size

But the trick will then be routing that parameter so that callbacks_ always overrides callbacks. Maybe if two parameters match except for _'s at the end, the one with the most _ takes precedence? So that:

def build_fn(batch_size=32):
    print(f"batch_size={batch_size}")
    ...

class MyClassifier(KerasClassifier):

    def hook_params(self):   # in base class this would just be a pass
         if self.batch_size is None:
              self.batch_size_ = 32
         else:
              self.batch_size_ = self.batch_size
         self.batch_size__ = 64

estimator = MyClassifier()
estimator.fit(X, y)  # always prints `batch_size=64`

@stsievert
Copy link
Collaborator Author

at least not without a deprecation period [for build_fn being renamed to model]

👍 that sounds like a good solution. I expect SciKeras's API to break; it's still a pretty young library.

if build_fn returned an uncompiled model) or re-compile (if build_fn returned a compiled model, or if build_fn was a compiled keas.Model instance)? Maybe we can leave this for another effort (separate from an initial parameter routing implementation, unless you think this might somehow change how parameter routing is going to work?)

Is there a way to check if a Keras model is compiled? I think it should be separate from this PR too.

To be clear (if I understood their docs correctly), iterator_train__batch_size will overwrite batch_size, right?

That's what it looks like. It looks like batch_size has the lowest priority.

@adriangb
Copy link
Owner

Is there a way to check if a Keras model is compiled? I think it should be separate from this PR too.

Just to wrap up this discussion: if there isn't a way to check this directly, I'm sure we can build something to do that check pretty easily. I think all that compile does is set the loss, metrics and optimizer.

@adriangb
Copy link
Owner

With all of the huge improvements within the last couple of weeks:

I think we are in a place to tackle parameter routing. I would like to make this the next priority for this library.

@stsievert
Copy link
Collaborator Author

stsievert commented Aug 19, 2020

These 7 parameters are accepted by BaseWrapper but not used:

  • loss
  • optimizer
  • verbose
  • callbacks
  • validation_split
  • shuffle
  • run_eagerly
  • batch_size

Which of these parameters be set after model compilation? It appears run_eagerly can be set after model.compile (source). Right now, the user will have to manually pass 6 of these parameters to model.compile inside their build_fn.

It might make more sense to have model/build_fn return an un-compiled model. Then SciKeras could handle the model compilation instead of the user. I think I prefer this solution so the user doesn't have to worry about passing the correct parameters.

I still like to see the API in #50 (comment).

@adriangb
Copy link
Owner

adriangb commented Aug 19, 2020

Of those, the following correspond to Model.compile:

  • loss
  • optimizer
  • callbacks
  • run_eagerly (although as you say, that can be set independently.

I think the only issue with what you are proposing is that it will not be backwards compatible. I think what we can do is have a separate function that calls Model.compile with these arguments only if build_fn does not return a compiled model. As an example: #66

@stsievert
Copy link
Collaborator Author

I think the only issue with what you are proposing is that it will not be backwards compatible.

Can models be compiled twice? As a user, I would expect that changing BaseWrapper's parameters influences the behavior.

I think there should be a deprecation period. At the very least, there should be a warning if an uncompiled model is returned.

@adriangb
Copy link
Owner

adriangb commented Aug 19, 2020 via email

@adriangb
Copy link
Owner

The only use case that my proposal of "compile only if uncompiled" does not cover is where the user is using a build_fn that they did not write or a prebuilt & precompiled model. I think this could be easily resolved by adding a force_compile=False keyword to __inti__ that forces the recompile using the __init__ parameters (including defaults).

@stsievert
Copy link
Collaborator Author

I think I'd rather have one clear and obvious way to use the library: "There should be one-- and preferably only one --obvious way to do it" (PEP 20). I'm not seeing the need to support both returning compiled and uncompiled models (at least not after any deprecation period; see below).

Returning an uncompiled model

I think basic usage would be covered, certainly for my needs and customization of loss/optimizer without any inheritance.

The advanced usage in #50 (comment) would require inheritance and overwriting compile:

class CustomLoss(KerasClassifier):
    def compile(self):
        loss = custom_fn(**self.meta_, **self.get_params())
        return model.compile(loss=loss, optimizer=self.optimizer_, callbacks=self.callbacks)

I'd show at least one example with use-case in mind.

Returning a compiled model

In this case, all compile parameters are passed to build_fn. It's pretty easy for the user to pass the relevant parameters to model.compile by including **kwargs in the signature definition of build_fn and passing that to model.compile.

I think most of the documentation should mention this example:

def build_fn(n_hidden=10, loss="mse", **kwargs):
    model = ...
    model.compile(loss=loss, **kwargs)
    return model

I'd prominently show passing **kwargs to model.compile, and I think loss should be an instantiated version of Kera's SGD optimizer (even if BaseWrapper.loss == "mse").


I prefer the first method, returning an uncompiled model. It'd easily work without subclassing for the simple use cases, and better fits with the object-oriented approach. For advanced use cases, it allows accessing other attributes of BaseWrapper, not only the parameters that are passed to build_fn.

I think a deprecation period should include the "compile if uncompiled" as mentioned, with a warning raised if the model is compiled.

@adriangb
Copy link
Owner

Thank you for the detailed overview!

I think that I need to mull over this for a bit, but I would like to share that my use case is a bit different.

My original goal for even trying to use KerasClassifier and KerasRegressor was to try to make self-contained estimators with idential functionality to sklearn.neural_net.MLPClassifier and sklearn.neural_net.MLPRegressorbut using Keras as the backend. This requires the loss selection logic that we currently have in dynamic_classifier and dynamic_regressor in the tests. I feel that since that is my use case, I am biased towards opposing anything that complicates that. Currently this use case is satisfied by writing <100 lines of code in a single function (dynamic_classifier and dynamic_regressor) and this proposal would mean subclassing and duplicating some logic across two functions.

I hope the use case helps understand where my hesitation comes from.

@stsievert
Copy link
Collaborator Author

I'm pretty ambivalent on the choices in #50 (comment), and can see the argument for returning a compiled model. When an uncompiled model is returned and the user wants to override BaseWrapper.compile, the user has to figure out the BaseWrapper interface (self.loss_, etc).

The complete interface I'm envisioning is this:

def model_build_fn(n_hidden=10, meta=None, params=None, optimizer="sgd", **kwargs):
    model = ...
    model.compile(optimizer=optimizer, **kwargs)
    return model

I think meta should have information on the number of features/outputs/etc, params should have BaseWrapper's parameters and kwargs should include the valid parameters for model.compile (#50 (comment)). I think an error should be raised if there's a mismatch between the compiled model parameters and BaseWrapper's parameters.

@adriangb
Copy link
Owner

adriangb commented Aug 24, 2020

Thank you for the concrete proposal @stsievert

I like that idea of passing dictionaries instead of parameters directly, but I'm hesitant to give special treatment to model.compile parameters, especially if they are all that is being passed within **kwargs.

Just to confirm, in your example n_hidden is a new parameter that the user is adding via build_fn (and should be tunable)?

How about an interface like this?

class BaseWrapper(BaseEstimator):
    def __init__(
        self,
        build_fn=None,
        *,
        random_state=None,
        optimizer="rmsprop",
        ...,
       **kwargs
):
    self.optimizer = optimizer
    ...
    vars(self).update(**kwargs)


    def route_params(self, destination_prefix: str, params: Dict[str, Any]):
        # here use `destination_prefix` to override `optimizer` with `compile_optimizer`
        # if the destination_prefix is `compile_`
       # and apply any other parameter routing/overriding logic we want
def model_build_fn(meta=None, params=None):
    print("build got" + params["optimizer"])
    print("build got" + params.get("compile_optimizer", None))
    print("build got" + params["buildparam"])
    print(" build got " + meta["n_featues_in_"])
    ...
   return model

estimator = KerasRegressor(
    model_build_fn,
    compile__optimizer="sgd",  # override optimizer param for `compile`
    __optimizer="adam"  # override the global optimizer param by preappending __
    build__buildparam="test"   # route a new param to `build_fn`
)

estimator.fit(...)
# prints "build got adam"
# prints "build got None"
# prints "build got test"
# prints "build got 2" (or whatever estimator.n_features_in_ is)
estimator.model_.optimizer == "sgd"  # True

This way we also don't have to introspect into build_fn's signature to figure out what parameters the user's build_fn needs since they define them when the call the wrapper's __init__.

If the user tries to access a parameter that they did not pass to the wrapper's initializer (ex: they forgot to put in the build__buildparam parameter but are accessing it in build_fn) the failure would be a KeyError within their user-defined function, and I think it would be pretty easy to figure out what went wrong, which IMO is better than what we have now where things happen somewhat "magically" by use of inspect.

@stsievert
Copy link
Collaborator Author

I like that idea of passing dictionaries instead of parameters directly, but I'm hesitant to give special treatment to model.compile parameters

Then let's fix that. Here's a revised interface with some documentation:

def model_build_fn(n_hidden=10, meta=None, params=None, compile=None):
    """
    n_hidden : int
        User-defined parameter for number of hidden units.
    meta : dict
        Describes the number of input features, outputs, classifier type, etc.
    params : dict
        The parameters of `BaseWrapper`, also available through `BaseWrapper.get_params`.
    compile : dict
        Valid parameters for `model.compile`. This means `model.compile(**compile)` can be used.
        This dictionary includes keys for `loss`, `optimizer`, which are instantiated Keras loss functions
        and optimizers only if `optimizer__foo` is passed (e.g., `optimizer__momentum` for `optimizer="sgd"`).
        Otherwise, the `loss` and `optimizer` values are strings.
    """
    model = build_model(n_hidden)
    model.compile(**compile)
    return model

I think this interface is simple and easy-to-use, and it also supports advanced usage:

  • I think this interface is clear because the documentation for compile only takes 4 lines.
  • I think this interface is easy to use because it doesn't require any custom logic for simple use.
  • It supports the advanced usage mentioned in Parameter routing #50 (comment): the user can use params for their processing, and can overwrite the loss key in compile if they choose to use it. If they felt like it, they could also manually pass all the parameters to model.compile.

I think I'd be okay reverting #22 to make the documentation simpler. n_hidden is a valid parameter of BaseWrapper through inspection, but parameters like optimizer__momentum aren't. I don't like that disconnect. I think we could minimize most of #18 with good error handling.

@adriangb
Copy link
Owner

This does look better! Would this replace #66, or would it be in-addition to?

I think I'd be okay reverting #22 to make the documentation simpler. n_hidden is a valid parameter of BaseWrapper through inspection, but parameters like optimizer__momentum aren't. I don't like that disconnect.

Do you mean that you would be okay reverting #22 so that there is only one way to declare tunable paramters for build_fn (passing them to the initializer, like I do in #50 (comment))?

And just because you brought up optimizer__momentum, I wanted to mention I think that I would like to keep the layers of routing to a single level, i.e. if the user wants to set momentum (which is an argument to Keras optimizers), they would do compile__momentum (in which case it would end up as momentum in the compile argument to build_fn) or compile__optimizer__momentum (in which case it would end up as optimizer__momentum in the compile argument to build_fn). They are allowed to follow the parameter routing name structure deeper or not, that is up to them.

@stsievert
Copy link
Collaborator Author

Would this replace #66, or would it be in-addition to?

Maybe "compile if uncompiled" is the best choice; it allows the user the most flexibility and provides a good default option (returning an uncompiled model and relying on BaseWrapper's parameters). It has pretty clear documentation.

Do you mean that you would be okay reverting #22 so that there is only one way to declare tunable paramters for build_fn (passing them to the initializer, like I do in #50 (comment))?

No. I think tunable parameters should still be able to be used without passing them to the initializer. But that might require some customization of set_params. Here's an example of the interface I'm thinking of:

from skorch import NeuralNetClassifier
import torch
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

class Mod(torch.nn.Module):
    def __init__(self, features=80, hidden=3):
        super().__init__()
        self.l1 = torch.nn.Linear(features, hidden)
        self.l2 = torch.nn.Linear(hidden, 1)

    def forward(self, x):
        return torch.sign(self.l2(self.l1(x)))

model = NeuralNetClassifier(Mod, max_epochs=1, criterion=torch.nn.MSELoss)

# `hidden` or `module__hidden` is not a model parameter
assert all("hidden" not in param for param in model.get_params())

params = {"module__hidden": [3, 4, 5]}
search = GridSearchCV(model, params)
X, y = make_classification(n_features=80)
X, y = X.astype("float32"), y.astype("float32")
search.fit(X, y)

@adriangb
Copy link
Owner

adriangb commented Aug 25, 2020

Can you help me understand what use case is not satisfied by having to declare the parameters in the initializer? I know we discussed this back in #22 and #18, so I am sorry if this feels like going backwards. My understanding is that the issue (#18) came from a combination of convenience and lack of documentation.

Is the interface in #67 more inconvenient? I think #67 is certainly easier to document and debug.

For your example above, #67 would look like:

from scikeras import KerasClassifier
from tensorflow import keras
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

def build_fn(meta_params, build_params, compile_params):
    features = meta_params["features"]
    hidden = build_params["hidden"]
    ...

model = KerasClassifier(build_fn, max_epochs=1, hidden=(100, ))

# `hidden` is a model parameter but `build__hidden` is not

params = {"build__hidden": [3, 4, 5]}
search = GridSearchCV(model, params)
X, y = make_classification(n_features=80)
X, y = X.astype("float32"), y.astype("float32")
search.fit(X, y)

hidden is a model parameter but build__hidden is not

Is this the "disconnect" you mentioned above?

All of this said, it would not be impossible to mix #67 with introspection into the arguments of build_fn (i.e. #22). But it does seem a bit unnecessary to have those arguments when we also always have a build_parameters argument.

@stsievert
Copy link
Collaborator Author

Can you help me understand what use case is not satisfied by having to declare the parameters in the initializer?

I'll walk through one use case.

Let's say I want to fine-tune a SciKeras model that a colleague has sent me. They relied on defaults for their activation function, so they didn't set model__activation/activation/etc as a parameter for their SciKeras model. When I load their model from the email attachment, my workflow will look like this:

import pickle
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

with open("email-attachment.pkl", "rb") as f:
    model = pickle.load(f)

params = {"model__activation": ["relu", "prelu", "leaky_relu", "relu6", "selu", "celu"]}
search = GridSearchCV(model, params)
X, y = make_classification(n_features=80)
X, y = X.astype("float32"), y.astype("float32")
search.fit(X, y)  # will fail because `model__activation` is not a valid parameter.

Skorch can handle this case.

@adriangb
Copy link
Owner

Yes I understand that, that is for the "routed" parameters. I agree that that is a good use case and we can work around that. It does not require introspection because we can from the prefix model__/build__ know that this is a parameter to be passed to build_fn within build_params. At least I think that's how it should work. Happy to try it out in #67 to see.

What I was more specifically asking about was the requirement to be able to do:

def build_fn(hidden=(100, ), meta_params, build_params, compile_params):
    features = meta_params["features"]
    ...

model = KerasClassifier(build_fn, max_epochs=1)

vs.

def build_fn(meta_params, build_params, compile_params):
    features = meta_params["features"]
    hidden = build_params["hidden"]
    ...

model = KerasClassifier(build_fn, max_epochs=1, hidden=(100, ))

@stsievert
Copy link
Collaborator Author

Yes I understand that, that is for the "routed" parameters. I agree that that is a good use case and we can work around that.

👍

It does not require introspection

I think the introspection in #22 should be removed. I do not think it's necessary, and is also confusing for more advanced usage like optimizer__momentum.

What I was more specifically asking about was the requirement to be able to do:

I think you're asking about forcing the signature of build_fn to have a particular structure (def build_fn(meta, build, compile)), and making sure that the user specifies all defaults in BaseWrapper.__init__. Is that correct?

@adriangb
Copy link
Owner

Yes, exactly. I would like to force that signature and require users to declare all non-routed defaults to BaseWrapper.__init__

@stsievert
Copy link
Collaborator Author

I don't see why forcing the user to conform to a particular signature is necessary, or how it makes SciKeras easier to use.

I think the model building function should be able to be used independently outside of SciKeras. I think build_model() should still return a valid Keras model even if used outside of SciKeras.

I think I would support forcing a particular signature only if inheritance/subclassing were used. But then it'd be possible to access all of BaseWrapper's parameters, so most of the signature would be meaningless.

But, I think most of this is besides the point.

def build_model(hidden=10, activation="relu"):
    print(hidden, activation)
    ...
    return model

m1 = SciKeras(model=build_model, model__hidden=20)
X, y = ...
m1.fit(X, y)  # prints "20, 'relu'"

m2 = m1.set_params(model__activation="prelu")
m2.fit(X, y)  # print "20, 'prelu'"

This would work with something like this implementation:

def fit(self, X, y):
    model_kwargs = {k[len("model__"):]: v for k, v in self.get_params() if "model__" in k}
    compile_kwargs = {k[len("compile__"):]: v for k, v in self.get_params() if "compile__" in k}
    meta_kwargs= {k[len("meta__"):]: v for k, v in self.get_params() if "meta__" in k}

    # only send compile/meta args if build_fn signature accepts them
    model_ = build_fn(compile=compile_kwargs, meta=meta_kwargs, **model_kwargs)
    ...

@adriangb
Copy link
Owner

I see. So instead of forcing build_params into the signature of build_fn, we would unpack those parameters, expecting build_fn to accept them? Would we force build_fn to accept **kwargs so that any model_kwargs that were not accepted are ignored, along with compile and meta, or would we introspect to see that build_fn accepts meta and/or compile before sending those?

@stsievert
Copy link
Collaborator Author

So instead of forcing build_params into the signature of build_fn, we would unpack those parameters, expecting build_fn to accept them?

Exactly. If the user typo'd/etc, Python will raise an exception "TypeError: build_model() got an unexpected keyword argument ...."

Would we force build_fn to accept **kwargs so that any model_kwargs that were not accepted are ignored, along with compile and meta, or would we introspect to see that build_fn accepts meta and/or compile before sending those?

I think we should introspect. That's what Dask does – for example, map_blocks will send a block_info argument if accepted.

@adriangb
Copy link
Owner

This sounds reasonable. We would introspect/check for two specific parameters, which I think is easy to understand if anyone has to debug. I'll implement this in #67.

@adriangb
Copy link
Owner

Implemented in #67. No problems. I'm marking that issue as closing this one (if/when it gets merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants