Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: time to simplify APIs? #37

Closed
adriangb opened this issue Aug 6, 2020 · 16 comments · Fixed by #67
Closed

RFC: time to simplify APIs? #37

adriangb opened this issue Aug 6, 2020 · 16 comments · Fixed by #67

Comments

@adriangb
Copy link
Owner

adriangb commented Aug 6, 2020

Background

Currently, this package has many ways to:

  • Pass arguments to build_fn/_keras_build_fn.
  • Pass arguments to Keras Models' fit or predict.

This comes from a combination of supporting the original tf.keras.wrappers.scikit_learn interface along with introduction of new ways to do things to improve Scikit-Learn compability.

Important principles

Since this aims to be a small pure Python package, I think it is important to keep in mind some of Python's guiding principles (cherry picked from PEP20):

  • Explicit is better than implicit.
  • Simple is better than complex.
  • There should be one-- and preferably only one --obvious way to do it.

For the most part, the APIs mentioned above rely on dynamic parsing of function signatures and filtering of kwargs or attributes by name. This is somewhat "complex" and "implicit" in my opinion.

Next steps

I feel (and here is where I would appreciate some feedback from others) that it would be good to fully document the requirements that these wrappers have and then narrow down the API to be as simple as possible while still meeting all of those requirements. Off the top of my head:

  • Full compatibility with the Scikit-Learn API (this includes hyperparameter optimization type stuff).
  • Compatible with Sequential, Functional and Model subclass Keras APIs.
  • Ability to use pre-built Keras Models.
  • Ability to access wrapper parameters and attributes during building of a Keras Model.
  • Ability to pass arguments to the Keras Model's fit and predict methods.
    • Is there a use case to pass parameters during a call to fit, or can we require that they be set from __init__? As far as I can tell, only the latter makes sense as far as Scikit-Learn is concerned.

Based on the above reqs (which admittedly could be shortsighted), it seems reasonable to me to take the following action:

A more extreme step would be to remove the build_fn argument and force users to always use the subclassing interface since it is technically even possible to return a pre-built model from _keras_build_fn via a closure or other methods. This would greatly simplify the API but I worry that it would be an inconvenience for users (even if it is just a couple more lines of code).

All in all I hope to reduce codebase complexity and simplify documentation. Any comments are welcome.

@adriangb adriangb pinned this issue Aug 6, 2020
@adriangb
Copy link
Owner Author

adriangb commented Aug 6, 2020

Tagging @stsievert and @sim-san because your feedback has been very valuable lately.

@stsievert
Copy link
Collaborator

Thanks for the notification @adriangb. I have a couple thoughts. Currently, there are 4 ways to create a SciKeras model:

  1. Pass a model building function to build_fn.
  2. Subclass and overwrite _keras_build_fn
  3. Pass an instantiated Keras Model
  4. Pass a class that has a __call__ method and builds the model when called.

Why are all these methods necessary? What features does each one provide that other options don't? For example, (3) is necessary because it allows the use an already trained model. I'm not seeing the additional features that (2) and (4) provide. If they don't provide any additional features I think they should be removed, or not mentioned in the documentation.

it seems reasonable to me to take the following action:

  1. Remove **kwargs from fit and predict and require that these parameters be set via __init__.
  2. Remove **kwargs from __init__ and instead hardcode Keras Model fit and predict parameters (maybe also compile parameters?) as proposed in [feature request] add keras.fit parameters as attributes to BaseWrapper #30.

👍 I would encourage those steps. I might use this interface:

class BaseWrapper(BaseEstimator):
    def __init__(self, build_fn=None, batch_size=32, epochs=100):
        self.build_fn = build_fn
        self.batch_size = batch_size
        self.epochs = epochs

Some of these are repeated for fit, score and predict, so I would allow the user to override when calling that function:

    def fit(self, X, y=None, batch_size=None):
        kwargs = {"batch_size": batch_size or self.batch_size, ...}
        ...

would be to remove the build_fn argument and force users to always use the subclassing interface

👎 I am not a fan of the subclassing interface, at least when parameters are defined in __init__ as above (which is where they should be defined). Why? Because valid parameters to BaseWrapper will not be valid parameters to the subclass even though they're accepted in __init__. For example:

from sklearn.base import BaseEstimator  # sklearn.__version__ == "0.23.2"

class Foo(BaseEstimator):
    def __init__(self, x=1):
        self.x = x
        super().__init__()

class Bar(Foo):
    def __init__(self, y=2, **kwargs):
        self.y = y
        super().__init__(**kwargs)

if __name__ == "__main__":
    bar = Bar(x=2)
    bar.set_params(x=3)  # raises ValueError: "Invalid parameter x for estimator Bar() ..."

I think the subclassing interface should still exist in case the user wants to overwrite another function (e.g, _validate_data or _post_process_y) but I don't think the user should be forced to use it. I would remove it from the documentation completely.

@adriangb
Copy link
Owner Author

adriangb commented Aug 10, 2020

Thank you for the feedback @stsievert

I'm not seeing the additional features that (2) and (4) provide.

(1), (2) and (4) came from the TF/Keras team (I added (3) ). Unfortunately, we do not have any design docs that explain the rational for all of these features.

(4) I think it has something to do with Keras' internal use of __call__ (ex: Model.__call__). I don't know any more than that. I have not used it much, and I do not see that it has much utility. I agree that we should get rid of this.

(2) This method is what I initially used to get full compatibility with the scikit-learn estimator checks (FullyCompliantRegressor and FullyCompliantClassifier in the tests), hence my preference for this method. But perhaps the **kwargs/**skparams machinery is now good enough that it can be used instead and still pass all of these checks.

Because valid parameters to BaseWrapper will not be valid parameters to the subclass even though they're accepted in init

I think that could be easily solved by explicitly passing the parameters, but I do see how that could be tedious if you are trying to just add a single parameter. We could also add a small hack to BaseWrapper._get_param_names to add BaseWrapper.__init__'s parameters as long as super (i.e. BaseWrapper.__init__) was called. This would be similar to what is already being done for **skparams. But the complexity vs. convenience would have to be justified.

I think the following is a good argument in favor of keeping subclassing. Please correct me if you see an alternative that I am not seeing.
Use case: a user wants to use hyperparameter tuning to select an activation function (or other arbitrary layer property) for certain layers. If we remove the **kwargs/**skparams from __init__ they would not be able to do this without subclassing.

On the other hand, we could eliminate the build_fn argument (I'm not saying we should, I'm just pointing out that the subclassing covers all of it's use cases) by just instructing users to use:

def _keras_build_fn(self, ...):
    return build_fn(...)  # where build_fn is a function that you would have previously passed to `__init__`

But eliminating build_fn means that users always have to subclass (which I do admit is more inconvenient) and it breaks the old API for sure. Also, Skorch has a similar parameter, so I think that keeping build_fn is at least somewhat justified and I would keep it for now.

I think maybe a reasonable first step would be to proceed with the steps we do agree on and then loop back here as we come up with more comments regarding subclassing/build_fn?

@sim-san
Copy link
Contributor

sim-san commented Aug 10, 2020

Thanks for the mentioning @adriangb
i would like to explain my use of the library and state my point of view. I using the the subclassing interface [2] in order to store additional information and overwriting _pre_process_y function. It would be a big loss for me if the subclassing interface would be dropped.

(4) I think it has something to do with Keras' internal use of call (ex: Model.call). I don't know any more than that. I have not used it much, and I do not see that it has much utility. I agree that we should get rid of this.

I completely agree with you here.

But eliminating build_fn means that users always have to subclass (which I do admit is more inconvenient) and it breaks the old API for sure. Also, Skorch has a similar parameter, so I think that keeping build_fn is at least somewhat justified and I would keep it for now.

I also like the possibility to first test ideas with the build_fn interface [1] and then convert a class (subclassing interface) [2]

@adriangb
Copy link
Owner Author

@sim-san how would you feel if **kwargs was removed from __init__? In practice this means that:

  1. It would no longer be able to inject arguments when using the build_fn interface.
  2. When you subclass, you'd have to redeclare all of the parameters you want to pass.

@sim-san
Copy link
Contributor

sim-san commented Aug 10, 2020

@adriangb

  1. It would no longer be able to inject arguments when using the build_fn interface.

I don't understand what you mean exactly.

  1. When you subclass, you'd have to redeclare all of the parameters you want to pass.
estimator = MLPRegressor(hidden_layer_sizes=[200], a_kwarg="saveme")
estimator.a_kwarg == "saveme"  # True

You mean this behavior ? I never used it.
So, it's fine to me to redeclare all of the parameters i want to pass.

@adriangb
Copy link
Owner Author

Yes, that behavior is what would not work anymore.

@sim-san
Copy link
Contributor

sim-san commented Aug 10, 2020

This is fine for me

@adriangb
Copy link
Owner Author

To be clear on what removing **kwargs from __init__ would mean, the following two uses (which currently work) would be broken.

def build_fn(hidden_layer_sizes=(100, )):
    pass

Case 1: build_fn

estimator = KerasRegressor(
    build_fn=build_fn,
    hidden_layer_sizes=(200, ),   # parameter would not be recognized since it is not in `KerasRegressor.__init__`
    ...
)

Case 2: Subclassing

class MLP(KerasRegressor):
    
   def __init__(self, hidden_layer_sizes=(100, )):
       self.hidden_layer_sizes = hidden_layer_sizes

   def _keras_build_fn(self, hidden_layer_sizes):
       return build_fn(hidden_layer_sizes=hidden_layer_sizes)

estimator = MLP(
    hidden_layer_sizes=(200, ),
    batch_size=64,  # parameter would not be recognized since it is not part of `MLP.__init__`
)

The only remaining way to do this would be:

class MLP(KerasRegressor):
    
   def __init__(self,
        hidden_layer_sizes=(100, ),
        batch_size=None,
    ):
       self.hidden_layer_sizes = hidden_layer_sizes
       self.batch_size = batch_size

   def _keras_build_fn(self, hidden_layer_sizes):
       return build_fn(hidden_layer_sizes=hidden_layer_sizes)

estimator = MLP(
    hidden_layer_sizes=(200, ),
    batch_size=64,  # would work
)

Or maybe if we introduced a small hack into KerasRegressor.__init__ to register it's parameters with _get_param_names by parsing the call stack and picking up the __init__'s that were used along the way. This would allow Case 2 to work but not Case 1. But I fear that this may not be much better than just keeping **kwargs for __init__ even after we hardcode parameters.

@stsievert
Copy link
Collaborator

I think the subclassing interface should stay. It might be removed from the documentation, but at the least, there should still be a note that "subclassing SciKeras classes will allow overwriting of _pre_process_y/etc."

To be clear on what removing **kwargs from init would mean

I think SciKeras needs to roll it's own implementation of {get, set}_params, which might include overriding _get_param_names. Skorch also rolls their own {set,get}_params in net.py#L1559-L1628 and net.py#L1490-L1496.

a user wants to use hyperparameter tuning

I think that's a key use-case. I think SciKeras should support these features without subclassing:

  • Having build_fn's defaults be valid parameters for BaseWrapper.
  • Having Keras.fit's default parameters be valid parameters for BaseWrapper.
  • Allowing the user to pass new keyword arguments to override build_fn's defaults.
  • Having a fully compatible Scikit-Learn API (or very close to it).

I think this is possible, but I think it will require a custom implementation of get_params and set_params. Let me rework #22 to provide a base implementation. Then, the following will be possible:

  • The behavior described in RFC: time to simplify APIs? #37 (comment) would still be possible.
  • BaseWrapper would be fully Scikit-Learn compatible. It won't be necessary to subclass to get a fully compliant Scikit-Learn estimator (which is even in the tests).
  • This behavior would work:
estimator = KerasRegressor(
    build_fn=build_fn, 
    hidden_layer_sizes=(200, ),
    batch_size=64,
)

build_fn/etc are defined in #37 (comment).

@adriangb
Copy link
Owner Author

adriangb commented Aug 10, 2020

We already roll our own _get_param_names (that's how **skparams works). It should be easy to add build_fn's parameters to it, however it may require calling _check_build_fn, so it would be smart to decorate _check_build_fn with functools. cached_property or something like that.

def _get_param_names(self):
    parameters = super()._get_param_names()
    parameters.extend(
        inspect.signature(self._check_build_fn).parameters.keys()
    )
    return parameters

That said, I may be missing something, but I don't see how this would be fully Scikit-Learn compliant. The tests essentially call inspect.signature(estimator.__init__) and compare that to estimator.get_params. This would populate build_fn's parameters into get_params (so most things work) but I don't see how it would get them into inspect.signature(estimator.__init__) so the scikit-learn tests would not pass. Please correct me if I am wrong @stsievert. If we can show that it is possible to pass all of these tests without subclassing, I think I would be much more open to implementing something like the above and reducing the use of sublclassing thorough the docs and tests.

The alternative that avoids overriding _get_param_names and other complexities is to define two interfaces:

  1. Quick simple interface using build_fn. Only the hardoced parameters of BaseWrapper and attributes created by fit are available to build_fn.
  2. Advanced interface. Anything more complex (including passing custom parameters to build_fn/_keras_build_fn will require subclassing and overriding of __init__, possibly re-defining all of the parameters that need to be tunable.

I think the main tradeoff here is complexity of implementation on our side and complexity of the docs vs. complexity for users. I feel that if the docs are clear and the implementation is clear for anyone that wants to inspect it, it should not be too much to ask for more users to use the "advanced" interface than previously.

@adriangb
Copy link
Owner Author

While we continue with this discussion, I want to say again that I really appreciate all of the great feedback.

Another proposal would be to do something similar to what skorch does: instead of dynamically inspecting the signature of build_fn for parameters, use a parameter-routing type mechanism where model_hidden_layer_sizes gets routed to build_fn/_keras_build_fn and model_fit_batch_size goes to model_.fit, model_predict_batch_size goes to model.predict etc.

I think this is nice in the sense that it keeps the flexibility offered by the current **skparams interface but makes it more structured and reduced the "magic"/introspection required behind the scenes.

@stsievert
Copy link
Collaborator

stsievert commented Aug 10, 2020

I see a couple issues around API simplification:

  1. Allowing parameters to build_fn to be passed to BaseWrapper.__init__.
  2. Setting default parameters of build_fn to be set in BaseWrapper.__init__ only when user does not provides value (Default parameters of build_fn are not returned by get_params #18).
  3. Setting Keras.fit/predict/etc defaults in BaseWrapper.__init__ ([feature request] add keras.fit parameters as attributes to BaseWrapper #30).
  4. Parameter routing; e.g, routing model__activation to build_fn and allowing build_fn to handle any errors (RFC: time to simplify APIs? #37 (comment))

The check marks represent the state of SciKeras the last time I used it. I think all of these points should be implemented.

On the 2nd point: it's a fairly simple implementation, and then the user doesn't have to look at the documentation when their hyperparameter search fails. When that happened for me, I had to look at the source code before filing #18.

I'm not seeing why it's necessary to write custom code via subclassing to obtain a Scikit-Learn compatible Keras model. I think it should be used if needed to overwrite _pre_process_y/etc.

use a parameter-routing type mechanism where model_hidden_layer_size gets rounded to build_fn

I think that'd be the simplest solution. Note: model__hidden_layer_sizes should be used per Scikit-Learn's double underscore notation. As with Skorch, it's likely reasonable to provide some unification of parameters; in Skorch, batch_size can stand-in for valid__batch_size and train__batch_size (source).

Two other points that came up while writing this comment:

  • Maybe build_fn should be renamed to model. build_fn builds a model, and model__activation is a lot more clear than build_fn__activation.
  • Maybe have the signature of the pre/post processing functions be preprocess(X, y) and postprocess(X, y)? That'd be a little simpler than _{pre}_process_{X,y} and _post_process_y. Maybe make them public functions too without a leading _?

@adriangb
Copy link
Owner Author

adriangb commented Aug 10, 2020

Going from the bottom up on this one:

  • Unifying and renaming pre/post processing functions. I am on board with making them public, but I think it might still be good to have postprocess_y and postprocess_X so that overriding handling of X doesn't require copying a ton of code so that the handling of y is mantained.
  • Renaming build_fn to model. I am also on board with this, it makes it similar to Skorch's module param. That said this would be a big API break (that's okay, we just need to be aware).
  • Parameter routing. Thank you for pointing out the use of __ vs _. I think this is a very neat way to handle things, especially since (I think) it reduces introspection into function signatures, which is generally something I am against.

Would you be willing to continue work on #18 and get a fully working example of that passes all of the tests without subclassing, and I will start tackling (1) cleaning up the pre/post process functions, (2) implementing the parameter routing and (3) hardcoding the __init__ parameters?

@stsievert
Copy link
Collaborator

Would you be willing to continue work on #18 and get a fully working example of that passes all of the tests without subclassing

Does "passes all tests" mean "all Scikit-Learn estimator checks pass except check_no_attributes_set_in_init"? I don't see how that test can pass: I think parameters need to be set in __init__. I'm more than okay with that: I'm not seeing any benefit to having this test pass because clone/copy/deepcopy still work as expected.

@adriangb
Copy link
Owner Author

adriangb commented Aug 11, 2020

Does "passes all tests" mean "all Scikit-Learn estimator checks pass except check_no_attributes_set_in_init"? I don't see how that test can pass: I think parameters need to be set in __init__. I'm more than okay with that: I'm not seeing any benefit to having this test pass because clone/copy/deepcopy still work as expected.

Yes, I didn't remember exactly which ones would not pass, but check_no_attributes_set_in_init is definitely one that won't pass. I agree with you that that is okay if that one does not pass.

Thank you for helping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants