cca_zoo.model_selection.GridSearchCV does not work when estimator has more than one latent dimension and scorer function is not provided #150

JohannesWiesner · 2022-10-07T15:42:51Z

I noticed that cca_zoo.model_selection.GridSearchCV does throw a ValueError when the estimator has more than one latent dimension and when the user does leaves cca_zoo.model_selection.GridSearchCV(scoring=None) as is. I am not sure, but if I remember the behavior of sklearn.model_selection.GridSearchCV correctly (and I guess you would like cca_zoo.model_selection.GridSearchCV to behave in the same way), the idea is, that if not otherwise provided GridSearchCV just uses the .score() method of the provided estimator? Checking a couple of docstrings of your estimators, this should be:

the average pairwise correlation between the views

Example:

from cca_zoo.models import SCCA_PMD
import numpy as np
from cca_zoo.model_selection import GridSearchCV

# create data
rng = np.random.RandomState(0)
X1 = rng.random((100,5))
X2 = rng.random((100,5))

# set latent dims
latent_dims=1

# run cross validation
estimator = SCCA_PMD(latent_dims=latent_dims,random_state=rng,c=[1,1])
param_grid = {'c':[[0.1,0.2],[0.1,0.2]]}
grid = GridSearchCV(estimator,
                    param_grid=param_grid,
                    cv=2)
grid.fit([X1,X2])

# run score
estimator.fit([X1,X2])
print(estimator.score([X1,X2]))

Of course the whole problem is easy to solve by simply providing the scorer function yourself (adopted from your docs):

def scorer(estimator, views):
    scores = estimator.score(views)
    return np.mean(scores)

But I guess, it would still be nice to do this in an automatic fashion? :)

The text was updated successfully, but these errors were encountered:

JohannesWiesner · 2023-01-24T12:15:34Z

@jameschapman19 : I think you can close this issue (for) now, don't know why (perhaps related to the latest commits in cca_zoo or scikit-learn), but it seems to work now. Here's some working code:

import numpy as np
from cca_zoo.models import GRCCA
from cca_zoo.model_selection import GridSearchCV

# create two random matrices and pretend both of them would have two feature
# groups
rng = np.random.RandomState(0)
X1 = rng.random((100,4))
X2 = rng.random((100,4))
feature_groups = [np.array([0,0,1,1]),np.array([0,0,1,1])]
latent_dims=2
estimator = GRCCA(latent_dims=latent_dims,random_state=rng)

# define a search space (optimize left and right penalty parameters)
c1 = [0,0.5,1]
c2 = [0,0.5,1]
mu1 = [0,0.5,1]
mu2 = [0,0.5,1]
param_grid = {'c':[c1,c2],'mu':[mu1,mu2]}

# FIXME: See issue #150: Defining this scorer function should actually 
# not be necessary, because this should be the default scoring function
# for all CCA-base classes
def scorer(estimator, views):
    scores = estimator.score(views)
    return np.mean(scores)

grid = GridSearchCV(estimator,param_grid,scoring=scorer)
grid.fit([X1,X2],estimator__feature_groups=feature_groups)
estimator_best = grid.best_estimator_
scores = grid.cv_results_

The magic lies in providing estimator__feature_groups=feature_groups to GridSearchCV's .fit() method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cca_zoo.model_selection.GridSearchCV does not work when estimator has more than one latent dimension and scorer function is not provided #150

cca_zoo.model_selection.GridSearchCV does not work when estimator has more than one latent dimension and scorer function is not provided #150

JohannesWiesner commented Oct 7, 2022

JohannesWiesner commented Jan 24, 2023

cca_zoo.model_selection.GridSearchCV does not work when estimator has more than one latent dimension and scorer function is not provided #150

cca_zoo.model_selection.GridSearchCV does not work when estimator has more than one latent dimension and scorer function is not provided #150

Comments

JohannesWiesner commented Oct 7, 2022

JohannesWiesner commented Jan 24, 2023