You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that cca_zoo.model_selection.GridSearchCV does throw a ValueError when the estimator has more than one latent dimension and when the user does leaves cca_zoo.model_selection.GridSearchCV(scoring=None) as is. I am not sure, but if I remember the behavior of sklearn.model_selection.GridSearchCV correctly (and I guess you would like cca_zoo.model_selection.GridSearchCV to behave in the same way), the idea is, that if not otherwise provided GridSearchCV just uses the .score() method of the provided estimator? Checking a couple of docstrings of your estimators, this should be:
the average pairwise correlation between the views
Example:
fromcca_zoo.modelsimportSCCA_PMDimportnumpyasnpfromcca_zoo.model_selectionimportGridSearchCV# create datarng=np.random.RandomState(0)
X1=rng.random((100,5))
X2=rng.random((100,5))
# set latent dimslatent_dims=1# run cross validationestimator=SCCA_PMD(latent_dims=latent_dims,random_state=rng,c=[1,1])
param_grid= {'c':[[0.1,0.2],[0.1,0.2]]}
grid=GridSearchCV(estimator,
param_grid=param_grid,
cv=2)
grid.fit([X1,X2])
# run scoreestimator.fit([X1,X2])
print(estimator.score([X1,X2]))
Of course the whole problem is easy to solve by simply providing the scorer function yourself (adopted from your docs):
@jameschapman19 : I think you can close this issue (for) now, don't know why (perhaps related to the latest commits in cca_zoo or scikit-learn), but it seems to work now. Here's some working code:
importnumpyasnpfromcca_zoo.modelsimportGRCCAfromcca_zoo.model_selectionimportGridSearchCV# create two random matrices and pretend both of them would have two feature# groupsrng=np.random.RandomState(0)
X1=rng.random((100,4))
X2=rng.random((100,4))
feature_groups= [np.array([0,0,1,1]),np.array([0,0,1,1])]
latent_dims=2estimator=GRCCA(latent_dims=latent_dims,random_state=rng)
# define a search space (optimize left and right penalty parameters)c1= [0,0.5,1]
c2= [0,0.5,1]
mu1= [0,0.5,1]
mu2= [0,0.5,1]
param_grid= {'c':[c1,c2],'mu':[mu1,mu2]}
# FIXME: See issue #150: Defining this scorer function should actually # not be necessary, because this should be the default scoring function# for all CCA-base classesdefscorer(estimator, views):
scores=estimator.score(views)
returnnp.mean(scores)
grid=GridSearchCV(estimator,param_grid,scoring=scorer)
grid.fit([X1,X2],estimator__feature_groups=feature_groups)
estimator_best=grid.best_estimator_scores=grid.cv_results_
The magic lies in providing estimator__feature_groups=feature_groups to GridSearchCV's .fit() method
I noticed that
cca_zoo.model_selection.GridSearchCV
does throw aValueError
when the estimator has more than one latent dimension and when the user does leavescca_zoo.model_selection.GridSearchCV(scoring=None)
as is. I am not sure, but if I remember the behavior ofsklearn.model_selection.GridSearchCV
correctly (and I guess you would likecca_zoo.model_selection.GridSearchCV
to behave in the same way), the idea is, that if not otherwise providedGridSearchCV
just uses the.score()
method of the provided estimator? Checking a couple of docstrings of your estimators, this should be:the average pairwise correlation between the views
Example:
Of course the whole problem is easy to solve by simply providing the scorer function yourself (adopted from your docs):
But I guess, it would still be nice to do this in an automatic fashion? :)
The text was updated successfully, but these errors were encountered: