Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return new model when calling fit #69

Closed
lars-reimann opened this issue Mar 24, 2023 · 2 comments · Fixed by #91
Closed

Return new model when calling fit #69

lars-reimann opened this issue Mar 24, 2023 · 2 comments · Fixed by #91
Assignees
Labels
enhancement 💡 New feature or request released Included in a release

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Mar 24, 2023

Is your feature request related to a problem?

Methods of data containers like Table always return a new data container. The fit method of models, however, mutates a model in place.

Desired solution

For the sake of consistency, we should consider whether fit should also return a new model. If we implement that, we should also think about differentiating between trained and untrained models. Only the latter should have a predict methods or methods for metrics (#64).

Possible names (likewise for regressors):

  • ClassifierDraft
  • ClassifierTemplate
  • ClassifierBlueprint
  • ClassifierPrototype - It's not quite a prototype though, since we don't just clone an instance when we fit. We also initialize weights etc. So that's not my favorite
  • ClassifierBuilder - We don't use the builder pattern though, values can only be set in the constructor. So that is not my favorite either.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

@lars-reimann lars-reimann added the cleanup 🧹 Refactorings and other tasks that improve the code label Mar 24, 2023
@lars-reimann lars-reimann self-assigned this Mar 27, 2023
@lars-reimann
Copy link
Member Author

lars-reimann commented Mar 27, 2023

I've decided against differentiating trained and untrained models in Python for several reasons:

  • The trained models must be part of the public interface because they should be usable as a type. However, we cannot prevent in this case that a user can also instantiate them directly, so it's still possible to call predict without calling fit. Both classes also show up in the documentation and it is confusing for users whether they need to work with a Classifier or a ClassiferDraft. Many would pick the wrong thing (Classifier) due to the shorter name.
  • Even with one class we can still differentiate between trained and untrained models in the DSL. There we can omit the constructor of untrained models.

Because of this, I'll only change that fit returns a new model.

@lars-reimann lars-reimann linked a pull request Mar 27, 2023 that will close this issue
@lars-reimann lars-reimann added enhancement 💡 New feature or request and removed cleanup 🧹 Refactorings and other tasks that improve the code labels Mar 27, 2023
lars-reimann added a commit that referenced this issue Mar 27, 2023
Closes #69.

### Summary of Changes

The `fit` method of classifiers/regressors now returns a new (fitted)
classifier regressor. The receiver of the method call is not changed
anymore. This is consistent with the methods on the `Table` class and
other data containers. Furthermore, `fit` is now a pure function, which
works better in notebooks and our own [execution
strategy](https://arxiv.org/abs/2302.14556).

---------

Co-authored-by: lars-reimann <[email protected]>
lars-reimann pushed a commit that referenced this issue Mar 27, 2023
## [0.6.0](v0.5.0...v0.6.0) (2023-03-27)

### Features

* allow calling `correlation_heatmap` with non-numerical columns ([#92](#92)) ([b960214](b960214)), closes [#89](#89)
* function to drop columns with non-numerical values from `Table` ([#96](#96)) ([8f14d65](8f14d65)), closes [#13](#13)
* function to drop columns/rows with missing values ([#97](#97)) ([05d771c](05d771c)), closes [#10](#10)
* remove `list_columns_with_XY` methods from `Table` ([#100](#100)) ([a0c56ad](a0c56ad)), closes [#94](#94)
* rename `keep_columns` to `keep_only_columns` ([#99](#99)) ([de42169](de42169))
* rename `remove_outliers` to `drop_rows_with_outliers` ([#95](#95)) ([7bad2e3](7bad2e3)), closes [#93](#93)
* return new model when calling `fit` ([#91](#91)) ([165c97c](165c97c)), closes [#69](#69)

### Bug Fixes

* handling of missing values when dropping rows with outliers ([#101](#101)) ([0a5e853](0a5e853)), closes [#7](#7)
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant