为什么投票分类器的准确性低于做出它的单个预测器之一答案

【问题标题】：Why the voting classifier has less accuracy than one of the individual predictors that made it为什么投票分类器的准确性低于做出它的单个预测器之一
【发布时间】：2021-04-10 11:17:15
【问题描述】：

我有一个关于投票分类器的简单问题。据我了解，投票分类器应该比那些构建它的个人预测器（群众的智慧）具有最高的准确性。这是代码

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC


# import dataset
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)

# split the dataset into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y)


rnd_clf = RandomForestClassifier(n_estimators=10, random_state=42)  
log_clf = LogisticRegression(solver='liblinear', random_state=42) 
svm_clf = SVC(gamma='auto', random_state=42)   


voting_clf = VotingClassifier(
    estimators= [('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')                                          
                      

voting_clf = voting_clf.fit(X_train, y_train)

predictors_list= [log_clf, rnd_clf, svm_clf, voting_clf]

for clf in predictors_list:
    
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_pred, y_test)
    
    print(clf.__class__.__name__, accuracy)

我得到的准确度如下：

逻辑回归 0.776 随机森林分类器 0.88 SVC 0.864 投票分类器 0.864

正如您在本次运行中看到的那样，随机森林预测器的准确度略高于 VotingClassifier！

对此有何解释？

非常感谢提前

费提

【问题讨论】：

标签： python classification voting

【解决方案1】：

我们来看看你传递的投票参数'hard' 文档说：

如果“难”，则使用预测的类标签进行多数规则投票。否则，如果“软”，则根据预测概率之和的 argmax 预测类别标签，推荐用于校准良好的分类器集合。

所以也许 ‍‍‍‍LogisticRegression 和你的 SVC(SVM) 的预测是相同的，并且在某些情况下是错误的，这使得你的多数票对这些情况是错误的。

您可以使用voting='soft' 或为每个模型的预测分配权重作为先验，这样您的预测对坏模型的错误预测有点免疫，并更多地依赖于您的最佳模型。

【讨论】：