【发布时间】:2015-12-20 02:21:12
【问题描述】:
在 Sklearn 中使用 GridsearchCV 时,我反复收到以下警告
“DataConversionWarning:复制输入数据帧以进行切片。”
我尝试在 Gridsearch 之外单独运行一些模型,但没有收到任何警告。它也没有阻止 Gridsearch 找到模型。
我有 2 个问题: 1)这个错误是什么意思? 2) 对我的输出有何影响(如果有)?
相关部分代码如下:
df = pd.read_csv(os.path.join(filepath, "Modeling_Set.csv")) #loads main data
keep_vars = pd.read_csv(os.path.join(filepath, "keep_vars.csv")) #loads a list of variables to keep from a CSV list
model_vars = keep_vars[keep_vars['keep']==1]['name'] #creates a list of vars to keep
modeling_df = df[model_vars] #creates the df with only keep vars
model_feature_vars = model_vars[:-1]
#Splits test and train data
X_train, X_test, y_train, y_test = train_test_split(modeling_df[model_feature_vars], modeling_df['Segment'], test_size=0.30, random_state=42)
#sets up models
#Range of parameters for gridsearch with decision trees
max_depth = range(2,20,2)
min_samples_split = range(2,10,2)
features = range(2, len(X_train.columns))
#set up for decision trees with gridsearch
parametersDT ={'feature_selection__k':features,
'feature_selection__score_func':(chi2, f_classif),
'classification__criterion':('gini','entropy'),
'classification__max_depth':max_depth,
'classification__min_samples_split':min_samples_split}
DT_with_K_Best = Pipeline([
('feature_selection', SelectKBest()),
('classification', DecisionTreeClassifier())
])
clf_DT = GridSearchCV(DT_with_K_Best, parametersDT, cv=10, verbose=2, scoring='f1_weighted', n_jobs = -2)
clf_DT.fit(X_train,y_train)
【问题讨论】: