ValueError: not enough values to unpack (expected 2, got 0) #13

sydneyhenrard · 2019-09-01T10:15:14Z

I tried to use your package with RandomForestRegressor, but I get an error

X_train.shape, y_train.shape, X_valid.shape, y_valid.shape
from hypopt import GridSearch
# Create the parameter grid
param_grid = {
    'bootstrap': [True],
    'max_features': [.5, 'sqrt', 'log2', .33],
    'min_samples_leaf': [1, 3, 5, 10, 25, 100],
    'n_estimators': [40]
}
# Create a based model
rf = RandomForestRegressor()
# Instantiate the grid search model
opt = GridSearch(model = rf, param_grid = param_grid)
# Fit the grid search to the data with validation set
opt.fit(X_train, y_train, X_valid, y_valid)

The output:

((389125, 173), (389125,), (12000, 173), (12000,))
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 12.77it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-659f3a61bf94> in <module>
     14 # Fit the grid search to the data with validation set
     15 
---> 16 opt.fit(X_train, y_train, X_valid, y_valid)
     17 #opt.get_best_params
     18 

C:\soft\Anaconda3\envs\sps\lib\site-packages\hypopt\model_selection.py in fit(self, X_train, y_train, X_val, y_val, scoring, scoring_params, verbose)
    361             else:
    362                 results = [_run_thread_job(job) for job in params]
--> 363             models, scores = list(zip(*results))
    364             self.model = models[np.argmax(scores)]
    365         else:

ValueError: not enough values to unpack (expected 2, got 0)

The text was updated successfully, but these errors were encountered:

BramVanroy · 2019-09-02T13:09:27Z

This seems to indicate that the results of the jobs are empty or None. Are you sure your data set is not empty?

sydneyhenrard · 2019-09-02T18:39:31Z

I tried with the parameter parallelize=False and I don't have the issue.

Furthermore the result for best parameters seems not to work. It seems to display the last tried parameters.

opt.get_best_params
<bound method GridSearch.get_best_params of GridSearch(cv_folds=3,
           model=RandomForestRegressor(bootstrap=True, criterion='mse',
                                       max_depth=None, max_features=0.33,
                                       max_leaf_nodes=None,
                                       min_impurity_decrease=0.0,
                                       min_impurity_split=None,
                                       min_samples_leaf=100,
                                       min_samples_split=2,
                                       min_weight_fraction_leaf=0.0,
                                       n_estimators=40, n_jobs=None,
                                       oob_score=False, random_state=0,
                                       verbose=0, warm_start=False),
           num_threads=8, parallelize=False,
           param_grid={'bootstrap': [True],
                       'max_features': [0.5, 'sqrt', 'log2', 0.33],
                       'min_samples_leaf': [1, 3, 5, 10, 25, 100],
                       'n_estimators': [40]},
           seed=0)>

To verify I ran the model on best params, and other params

m = RandomForestRegressor(n_estimators=40, 
                          min_samples_leaf=100,
                          max_features=0.33,
                          n_jobs=-1, oob_score=True)
%time m.fit(X_train, y_train)
print_score(m, X_train, y_train, X_valid, y_valid)

m = RandomForestRegressor(n_estimators=40, 
                          min_samples_leaf=3,
                          max_features=0.5,
                          n_jobs=-1, oob_score=True)
%time m.fit(X_train, y_train)
print_score(m, X_train, y_train, X_valid, y_valid)

As you can see the second model is better (OOB is the last number) but it's not the best params.

Wall time: 6.61 s
[0.2593896702497389, 0.2768686574114764, 0.8593822384678886, 0.8631024854156082, 0.852173401851975]
Wall time: 12.3 s
[0.12585560380471036, 0.2279723348842033, 0.9668960405592746, 0.9071862610046989, 0.9084671021618435]

Maybe I am missing something on how to use the library

BramVanroy · 2019-09-03T07:09:44Z

Hm, that's interesting. If that's correct (and parallelize=False returns the last tried model, not the best one) then I'd say that that's a bug. Pinging @cgnorthcutt

cgnorthcutt · 2019-09-03T13:17:45Z

Hey folks, appreciate the insights here. I'm fully booked with the upcoming ML paper deadlines for the fall. If you can take a stab at a PR I'll take a look, but to figure it out myself might be some time, just a heads up.

cgnorthcutt · 2019-09-03T13:19:46Z

A couple things to check:

Random seeding
If you run a hundred times are the best params always the same?
Change the order of the param grid, does the best change?

mouadriyad · 2019-11-27T23:47:01Z

same here

cgnorthcutt · 2019-11-28T00:54:10Z

Hi, can you please post complete code to reproduce as simply as possibly. @mouadriyad

statcom · 2020-05-13T04:42:56Z

I had the exact same problem at my first time with hypopt.

    svm_cls_rbf = svm.SVC(kernel='rbf')
    param_grid = {'C': [10, 100, 200, 500, 750, 1000, 10000], 'gamma': [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4]}
    grid_search = GridSearch(model=svm_cls_rbf, param_grid=param_grid, num_threads=8, parallelize=True)
    grid_search.fit(feature_train, y_train, feature_validation, y_validation)
    best_parameters = grid_search.best_estimator_.get_params()

ghost · 2020-06-10T16:46:33Z

The problem is that each fitted model is connected with all the previously calculated models (inheritance) and overrides them all. Therefere the scores stay the same, but the models end up being identical.

What worked for me is:

def _run_thread_job(model_params):
   ...
return model, score # don't return a tuple

and then exchange:

def fit(...):
...
if self.parallelize:
        results = _parallel_param_opt(params, self.num_threads)
else:
        #results = [_run_thread_job(job) for job in params] ##old
        models = []
        scores = []
        for i in range(len(params)):
                if i%50==0:
                        print(f'Nr of Model: {i}')
                a,b = _run_thread_job(params[i])
                models.append(sklearn.base.clone(a, safe=True))
                scores.append(b)
            #print(results) ##old
            #models, scores = list(zip(*results))## old
self.model = models[np.argmax(scores)]
self.model.fit(X_train, y_train) #model has to be fitted again afterwards

There is probably a better way to do this, but it worked for me :)
https://scikit-learn.org/stable/modules/generated/sklearn.base.clone.html

cgnorthcutt · 2020-06-11T16:52:38Z

@phibil Great! please submit a pull request?

dariopasquali · 2020-10-19T14:24:03Z

Hi,
I just noticed the same bug, any news about this?
Moreover, If I disable the parallelization the method returns an array of None

Rajjat · 2021-02-09T22:23:27Z

I am still getting the same problem. Is there any fix to it?

AlessandroMiola · 2021-02-09T23:05:23Z

Which OS are you working on? Plus, are you using default parallelize=True within the call to GridSearch()?
Parallelization issues on Windows systems still hold as far as I could understand some month ago. The actual solution might be using parallelize=False.

Rajjat · 2021-02-10T07:08:19Z

I am using linux OS. I tried with both parallelize=True and parallelize=False .

Rajjat · 2021-02-10T13:11:42Z

I am using linux OS. I tried with both parallelize=True .

After doing parallelize false, I got an another error "zip argument #1 must support iteration" that is also one of the issue created in your github repository.

anyutakorytnik · 2021-12-19T13:05:07Z

Same problem as Rajjat

HerrGeorg · 2023-03-26T02:39:44Z

Has this been resolved?

This was referenced Dec 6, 2020

Realign examples #25

Merged

Fix bug within _run_thread_job #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: not enough values to unpack (expected 2, got 0) #13

ValueError: not enough values to unpack (expected 2, got 0) #13

sydneyhenrard commented Sep 1, 2019

BramVanroy commented Sep 2, 2019

sydneyhenrard commented Sep 2, 2019

BramVanroy commented Sep 3, 2019

cgnorthcutt commented Sep 3, 2019

cgnorthcutt commented Sep 3, 2019

mouadriyad commented Nov 27, 2019

cgnorthcutt commented Nov 28, 2019 •

edited

Loading

statcom commented May 13, 2020

ghost commented Jun 10, 2020 •

edited by ghost

Loading

cgnorthcutt commented Jun 11, 2020

dariopasquali commented Oct 19, 2020

Rajjat commented Feb 9, 2021 •

edited

Loading

AlessandroMiola commented Feb 9, 2021

Rajjat commented Feb 10, 2021 •

edited

Loading

Rajjat commented Feb 10, 2021

anyutakorytnik commented Dec 19, 2021

HerrGeorg commented Mar 26, 2023

ValueError: not enough values to unpack (expected 2, got 0) #13

ValueError: not enough values to unpack (expected 2, got 0) #13

Comments

sydneyhenrard commented Sep 1, 2019

BramVanroy commented Sep 2, 2019

sydneyhenrard commented Sep 2, 2019

BramVanroy commented Sep 3, 2019

cgnorthcutt commented Sep 3, 2019

cgnorthcutt commented Sep 3, 2019

mouadriyad commented Nov 27, 2019

cgnorthcutt commented Nov 28, 2019 • edited Loading

statcom commented May 13, 2020

ghost commented Jun 10, 2020 • edited by ghost Loading

cgnorthcutt commented Jun 11, 2020

dariopasquali commented Oct 19, 2020

Rajjat commented Feb 9, 2021 • edited Loading

AlessandroMiola commented Feb 9, 2021

Rajjat commented Feb 10, 2021 • edited Loading

Rajjat commented Feb 10, 2021

anyutakorytnik commented Dec 19, 2021

HerrGeorg commented Mar 26, 2023

cgnorthcutt commented Nov 28, 2019 •

edited

Loading

ghost commented Jun 10, 2020 •

edited by ghost

Loading

Rajjat commented Feb 9, 2021 •

edited

Loading

Rajjat commented Feb 10, 2021 •

edited

Loading