使用具有部分拟合的 sklearn 投票合奏答案

【问题标题】：Using sklearn voting ensemble with partial fit使用具有部分拟合的 sklearn 投票合奏
【发布时间】：2017-08-12 16:59:23
【问题描述】：

有人可以告诉如何使用部分拟合在 sklearn 中使用集合。我不想重新训练我的模型。或者，我们可以通过预先训练的模型进行集成吗？例如，我已经看到投票分类器不支持使用部分拟合进行训练。

【问题讨论】：

标签： python machine-learning scikit-learn ensemble-learning

【解决方案1】：

Mlxtend 库有一个 VotingEnsemble 实现，它允许您传入预拟合模型。例如，如果您有三个预训练模型 clf1、clf2、clf3。下面的代码可以工作。

from mlxtend.classifier import EnsembleVoteClassifier
import copy
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1], fit_base_estimators=False)

当设置为 false 时，EnsembleVoteClassifier 中的 fit_base_estimators 参数可确保分类器不会被重新拟合。

一般来说，在寻找 sci-kit learn 不提供的更高级的技术特性时，请首先参考 mlxtend。

【讨论】：

当我对合适的模型执行此操作时，我得到：NotFittedError: Estimator not fitted, call 'fit' before exploiting the model.
我发现无论如何你都必须运行eclf.fit。但是，如果这些分类器是根据不同的训练数据进行训练的，那么将什么传递给eclf.fit 命令？
这个答案其实解决了问题，谢谢！但是，为了获得一些可运行的代码，您应该在import 之前添加一个from。 Here 是文档，适合任何感兴趣的人。
显然refit现在改名为fit_base_estimators
@RilwanAdewoyin 也请更新被截断的代码下方的文字。

【解决方案2】：

解决方法：

VotingClassifier 检查是否设置了 estimators_ 以了解它是否适合，并使用 estimators_ 列表中的估算器进行预测。如果你有预训练的分类器，你可以像下面的代码一样直接把它们放在 estimators_ 中。

但是，它也是使用 LabelEnconder，所以它假设标签是 0,1,2,...，你还需要设置 le_ 和 classes_（见下文）。

from sklearn.ensemble import VotingClassifier
from sklearn.preprocessing import LabelEncoder

clf_list = [clf1, clf2, clf3]

eclf = VotingClassifier(estimators = [('1' ,clf1), ('2', clf2), ('3', clf3)], voting='soft')

eclf.estimators_ = clf_list
eclf.le_ = LabelEncoder().fit(y)
eclf.classes_ = seclf.le_.classes_

# Now it will work without calling fit
eclf.predict(X,y)

【讨论】：

【解决方案3】：

不幸的是，目前这在 scikit VotingClassifier 中是不可能的。

但您可以使用http://sebastianraschka.com/Articles/2014_ensemble_classifier.html（从中实现 VotingClassifer）尝试实现您自己的投票分类器，该分类器可以采用预拟合模型。

我们也可以查看source code here并修改为我们使用：

from sklearn.preprocessing import LabelEncoder
import numpy as np

le_ = LabelEncoder()

# When you do partial_fit, the first fit of any classifier requires 
all available labels (output classes), 
you should supply all same labels here in y.
le_.fit(y)

# Fill below list with fitted or partial fitted estimators
clf_list = [clf1, clf2, clf3, ... ]

# Fill weights -> array-like, shape = [n_classifiers] or None
weights = [clf1_wgt, clf2_wgt, ... ]
weights = None

#For hard voting:
pred = np.asarray([clf.predict(X) for clf in clf_list]).T
pred = np.apply_along_axis(lambda x:
                           np.argmax(np.bincount(x, weights=weights)),
                           axis=1,
                           arr=pred.astype('int'))

#For soft voting:
pred = np.asarray([clf.predict_proba(X) for clf in clf_list])
pred = np.average(pred, axis=0, weights=weights)
pred = np.argmax(pred, axis=1)

#Finally, reverse transform the labels for correct output:
pred = le_.inverse_transform(np.argmax(pred, axis=1))

【讨论】：

【解决方案4】：

实施投票并不难。这是我的实现：

import numpy as np 

class VotingClassifier(object):
    """ Implements a voting classifier for pre-trained classifiers"""

    def __init__(self, estimators):
        self.estimators = estimators

    def predict(self, X):
        # get values
        Y = np.zeros([X.shape[0], len(self.estimators)], dtype=int)
        for i, clf in enumerate(self.estimators):
            Y[:, i] = clf.predict(X)
        # apply voting 
        y = np.zeros(X.shape[0])
        for i in range(X.shape[0]):
            y[i] = np.argmax(np.bincount(Y[i,:]))
        return y

【讨论】：

【解决方案5】：

Mlxtend 库有一个实现工程，您仍然需要为 EnsembleVoteClassifier 调用 fit 函数。似乎fit 函数并没有真正修改任何参数，而是检查可能的标签值。在下面的例子中，你必须给一个数组包含所有可能出现在原始 y 中的值（在本例中为 1,2）到 eclf2.fit 这对 X 没关系。

import numpy as np
from mlxtend.classifier import EnsembleVoteClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
import copy
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

for clf in (clf1, clf2, clf3):
    clf.fit(X, y)    
eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3],voting="soft",refit=False)
eclf2.fit(None,np.array([1,2]))
print(eclf2.predict(X))

【讨论】：