【发布时间】:2021-07-20 16:24:58
【问题描述】:
我正在尝试进行功能选择,我正在使用 RFECV 和 LogisticRegression。为此,我需要缩放数据,因为否则回归不会收敛。
但是,我认为如果我先对完整数据进行缩放,那将是有偏差的(基本上数据会泄漏到测试集)。
这是我目前的代码:
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
cv = StratifiedKFold(5)
scaler = MinMaxScaler()
reg = LogisticRegression(max_iter=1000, solver="newton-cg")
pipeline = Pipeline(steps=[("scale",scaler),("lr",reg)])
visualizer = RFECV(pipeline, cv=cv, scoring='f1_weighted')
但它给了我这个错误:
Traceback (most recent call last):
File "<ipython-input-267-0073ead26d52>", line 1, in <module>
visualizer.fit(x_6, y_6) # Fit the data to the visualizer
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 550, in fit
scores = parallel(
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 551, in <genexpr>
func(rfe, self.estimator, X, y, train, test, scorer)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 33, in _rfe_single_fit
return rfe._fit(
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 204, in _fit
raise RuntimeError('The classifier does not expose '
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
我什至还没有将它与数据相匹配。
我尝试搜索,但找不到任何有用的东西。有什么想法可能会失败吗?
【问题讨论】:
标签: python scikit-learn pipeline feature-selection rfe