ValueError：模型的特征数量必须与输入匹配。模型 n_feature> 输入 nfeature答案

【问题标题】：ValueError: Number of features of the model must match the input. model n_feature> input nfeatureValueError：模型的特征数量必须与输入匹配。模型 n_feature> 输入 nfeature
【发布时间】：2017-09-29 11:01:09
【问题描述】：

我正在尝试为 9 个输入功能实现隔离林使用了来自的示例 http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

我的训练和测试集有 9 个特征，因此我正在创建相同特征大小的 Xtrian 和 Xtest

X.shape 
(100, 9)
 >> X_train.shape
(200, 9)

我的代码：

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)

# Generate train data
X = 0.3 * rng.randn(100, 9)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 9)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 9))

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
           ["training observations",
            "new regular observations", "new abnormal observations"],
           loc="upper left")
plt.show()

但我遇到了错误

---------------------------------------------------------------------------

ValueError: Number of features of the model must match the input. Model n_features is 9 and input n_features is 2

在我的情况下，我的错误显示：模型 n_features 是 9，输入 n_features 是 2

关于我在这里缺少的任何输入：

【问题讨论】：

标签： python numpy scikit-learn

【解决方案1】：

即使您已经拟合了具有 9 个特征的模型，代码的绘图部分仍然假定只有两个维度，就像您正在处理的示例中的情况一样：

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

查看np.c_() 数组的形状被传递到clf.decision_function()：

np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)

您收到错误是因为 clf 需要 9-D 输入，但您只提供了一个 2-D 数组。

分类器本身应该仍然可以毫无问题地访问。例如，您仍然可以使用 decision_function() 和 predict() 方法，但您将无法使用您要使用的代码绘制所有 9 个维度 - 它仅设计用于二维绘图。即使运行具有 9 个维度的 np.meshgrid()，几乎肯定会抛出 MemoryError - 有关更多信息，请参阅 this discussion。

无论如何，尝试绘制 9-D 空间在这里不会很有帮助。相反，您可能会专注于分类器强度的更标准的视觉表示，例如 ROC curves 甚至是老式的 confusion matrix。

【讨论】：

感谢您的洞察力！我期待 y_pred_test = clf.predict(X_test) y_pred_outliers = clf.predict(X_outliers) 的某种标签预测结果，但它没有给我任何结果..所以那里有什么遗漏或者我能够生成混淆矩阵来自它？
尝试运行你的代码到y_pred_outliers = clf.predict(X_outliers)，你应该得到一些输出。（我只是在没有导致错误的部分的情况下运行了您的代码，并且它一直运行到该行。）您可以使用y_pred_outliers 生成混淆矩阵，作为confusion_matrix() 中的y_pred，以及长度为y_pred_outliers 的向量-1 为 y_true。但请记住，您正在处理的这个示例旨在显示异常/异常值的非常清晰的情况 - 您可能会从 predict() 中获得 100% 的真阳性，假设 X_outliers 作为输入。