【发布时间】:2015-09-29 08:49:39
【问题描述】:
我一直在运行此website 上显示的“平均降低精度”度量:
在示例中作者使用的是随机森林回归器RandomForestRegressor,但我使用的是随机森林分类器RandomForestClassifier。因此,我的问题是,我是否也应该使用r2_score 来测量准确度,或者我是否应该切换到经典准确度accuracy_score 或matthews 相关系数matthews_corrcoef?。
如果我应该切换,这里有人吗?为什么?
感谢您的帮助!
这里是来自网站的代码,以防你懒得点击:)
from sklearn.cross_validation import ShuffleSplit
from sklearn.metrics import r2_score
from collections import defaultdict
X = boston["data"]
Y = boston["target"]
rf = RandomForestRegressor()
scores = defaultdict(list)
#crossvalidate the scores on a number of different random splits of the data
for train_idx, test_idx in ShuffleSplit(len(X), 100, .3):
X_train, X_test = X[train_idx], X[test_idx]
Y_train, Y_test = Y[train_idx], Y[test_idx]
r = rf.fit(X_train, Y_train)
acc = r2_score(Y_test, rf.predict(X_test))
for i in range(X.shape[1]):
X_t = X_test.copy()
np.random.shuffle(X_t[:, i])
shuff_acc = r2_score(Y_test, rf.predict(X_t))
scores[names[i]].append((acc-shuff_acc)/acc)
print "Features sorted by their score:"
print sorted([(round(np.mean(score), 4), feat) for
feat, score in scores.items()], reverse=True)
【问题讨论】:
标签: python machine-learning statistics scikit-learn classification