【发布时间】:2015-09-18 15:05:57
【问题描述】:
grid-search-cross-validation 能否通过决策树分类器提取最佳参数? http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html
【问题讨论】:
标签: scikit-learn
grid-search-cross-validation 能否通过决策树分类器提取最佳参数? http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html
【问题讨论】:
标签: scikit-learn
为什么不呢?
我邀请您查看GridsearchCV的文档。
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
param_grid = {'max_depth': np.arange(3, 10)}
tree = GridSearchCV(DecisionTreeClassifier(), param_grid)
tree.fit(xtrain, ytrain)
tree_preds = tree.predict_proba(xtest)[:, 1]
tree_performance = roc_auc_score(ytest, tree_preds)
print 'DecisionTree: Area under the ROC curve = {}'.format(tree_performance)
并提取最佳参数:
tree.best_params_
Out[1]: {'max_depth': 5}
【讨论】:
tree.best_params_返回像{'params1' : bestparam}的词典。在这种情况下,您将拥有例如{'a' : 1, 'b' : 0.4, ...} span>
GridSearchCV,则不希望使用您的完整数据集以适合而不是培训子集?由于GridSearchCV正在执行k-fold cv的一些变体,因此似乎是浪费数据,只能使用您的培训数据。 span>
这里是决策树网格搜索的代码
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
def dtree_grid_search(X,y,nfolds):
#create a dictionary of all values we want to test
param_grid = { 'criterion':['gini','entropy'],'max_depth': np.arange(3, 15)}
# decision tree model
dtree_model=DecisionTreeClassifier()
#use gridsearch to test all values
dtree_gscv = GridSearchCV(dtree_model, param_grid, cv=nfolds)
#fit model to data
dtree_gscv.fit(X, y)
return dtree_gscv.best_params_
【讨论】: