-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Fixes for Tabular Regression #235
[FIX] Fixes for Tabular Regression #235
Conversation
@@ -74,8 +80,10 @@ def fit(self, X: Dict[str, Any], y: Any = None) -> autoPyTorchSetupComponent: | |||
|
|||
# instantiate model | |||
self.model = self.build_model(input_shape=input_shape, | |||
logger_port=X['logger_port'], | |||
output_shape=output_shape) | |||
logger_port=X['logger_port'] if 'logger_port' in X else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Think we should always have this here, the logger port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in one of the tests we didn't have it
autoPyTorch/pipeline/components/setup/traditional_ml/base_model.py
Outdated
Show resolved
Hide resolved
...rch/pipeline/components/setup/traditional_ml/traditional_learner/base_traditional_learner.py
Outdated
Show resolved
Hide resolved
...rch/pipeline/components/setup/traditional_ml/traditional_learner/base_traditional_learner.py
Outdated
Show resolved
Hide resolved
autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/learners.py
Outdated
Show resolved
Hide resolved
assert 'val_preds' in model.fit_output.keys() | ||
assert isinstance(model.fit_output['val_preds'], list) | ||
assert len(model.fit_output['val_preds']) == len(fit_dictionary_tabular['val_indices']) | ||
if model.model.is_classification: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a unit test that makes sure that is_classifiaction what set properly? Like I was not able to find where in the code we make sure that is properly setup up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert y_pred.shape[0] == len(fit_dictionary_tabular['val_indices']) | ||
# Test if classifier can score and | ||
# the result is same as in results | ||
score = model.score(fit_dictionary_tabular['X_train'][fit_dictionary_tabular['val_indices']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the value of the score? I think this traditional classifier should achieve a pretty good score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately some of the classifiers fail to get a good score on some datasets. Sometimes its really low as well. In a later PR we can try and optimize the hyperparameters of the traditional classifiers to get good score for all scenarios but I feel for the purpose of this PR its fine.
test/test_pipeline/components/setup/test_setup_traditional_models.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the PR, with this we will be able to compare to other automl systems on regression.
Some minor question/changes on this PR.
I just started running with this, but the first fix we need so that I do not forget is that we have to update this file: https://github.com/automl/Auto-PyTorch/blob/refactor_development/MANIFEST.in with the new json files. Also, i think we have to add the greedy portfolio here? |
The only other question that I have is what should we do with the greedy portfolio for regression? One possibility can be to have in there the default configuration per neural network (default per mlp, per shaped, and so on). I see very good performance on the default configuration in Boston, but not so good in other configurations because the BO model has not yet learned what to do with the other configurations. The other is to generate the portfolio for regression. What do you think? |
I think for now we can continue using the greedy portfolio json configs, and when we setup the scripts to build the portfolio ourselves, we can build one for regression as well. However, as you are saying that the default configs are giving good results, we can compare them with the portfolio we have right now and use the one which gives the best performance boost |
autoPyTorch/pipeline/components/setup/traditional_ml/base_model.py
Outdated
Show resolved
Hide resolved
autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/utils.py
Outdated
Show resolved
Hide resolved
autoPyTorch/pipeline/components/training/trainer/base_trainer_choice.py
Outdated
Show resolved
Hide resolved
results in a MemoryError. | ||
y (np.ndarray): | ||
Ground Truth labels | ||
metric_name (str, default = 'r2'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it from sklearn, right?
Can you add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no these are our names
@@ -300,6 +299,7 @@ def test_pipeline_score(fit_dictionary_tabular_dummy): | |||
|
|||
pipeline = TabularRegressionPipeline( | |||
dataset_properties=fit_dictionary_tabular_dummy['dataset_properties'], | |||
random_state=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we convert a seed with a random state instance with this line.
@@ -0,0 +1,70 @@ | |||
from typing import Any, Dict, Optional, Tuple, Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later
@@ -0,0 +1,266 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later
@@ -0,0 +1,366 @@ | |||
import logging.handlers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later
@@ -0,0 +1,185 @@ | |||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later
@@ -0,0 +1,134 @@ | |||
import copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later
b0e444f
to
a931763
Compare
Codecov Report
@@ Coverage Diff @@
## development #235 +/- ##
===============================================
+ Coverage 80.73% 81.14% +0.41%
===============================================
Files 148 150 +2
Lines 8563 8559 -4
Branches 1323 1331 +8
===============================================
+ Hits 6913 6945 +32
+ Misses 1173 1131 -42
- Partials 477 483 +6
Continue to review full report at Codecov.
|
…eed in regression
23210ad
to
6853a13
Compare
This PR allows reproducibility in Tabular Regression, enables traditional methods for tabular regression, and adds tests for these.
Specifically, it adds the following
BaseClassifier
in 'classifier_models' toBaseTraditionalLearner
TraditionalLearners
to remove duplicate code.score
toTabularClassificationPipeline
and improve documentation forTabularRegressionPipeline
.TraditionalTabularRegressionPipeline
test_tabular_classification
andtest_tabular_regression
P.S. I couldn't think of better names so please feel free to suggest.