【发布时间】:2021-11-18 22:37:09
【问题描述】:
我已经编写了我自己的 CustomClassifier 来对因变量进行二值化。这是代码
class OwnClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, estimator=None):
self.yt = None
if estimator is None:
estimator = LogisticRegression(solver='liblinear')
self.estimator = estimator
self.discr = KBinsDiscretizer(n_bins=4, encode='ordinal')
def fit(self, X, y):
self.yt = y.copy()
self.yt = self.discr.fit_transform(self.yt.reshape(-1, 1)).astype(int)
self.estimator.fit(X,self.yt.ravel())
return self
def predict(self, X):
return self.estimator.predict(X)
def predict_proba(self, X):
return self.estimator.predict_proba(X)
def score(self, X, y=None):
return accuracy_score(self.yt, self.predict(X))
在上面使用GridSearchCV时,会报错:
grid = [{'estimator__C': [1, 10, 100, 1000]}]
myLogi = OwnClassifier()
gridCv = GridSearchCV(myLogi, grid)
gridCv.fit(X, y)
分类器如何兼容 GridSearchCV?
我使用波士顿住房数据
boston_data = load_boston()
X = boston_data['data']
y = boston_data['target']
错误:
ValueError: Found input variables with inconsistent numbers of samples: [404, 102]
【问题讨论】:
-
哪里出错了?
-
错误是:ValueError:发现输入变量的样本数不一致:[404, 102]
标签: python scikit-learn gridsearchcv