ValueError：模型的特征数量必须与输入匹配。模型 n_features 为 11，输入 n_features 为 2答案

【问题标题】：ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2ValueError：模型的特征数量必须与输入匹配。模型 n_features 为 11，输入 n_features 为 2
【发布时间】：2021-01-19 13:42:43
【问题描述】：

在 jupyter notebook 中运行以下代码时，出现值错误。

ValueError：模型的特征数量必须与输入匹配。模型 n_features 为 11，输入 n_features 为 2

如何解决这个问题？

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))

我收到以下错误：

ValueError                                Traceback (most recent call last)
<ipython-input-42-bc13e66e79fe> in <module>
      4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
      5                      np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
      7              alpha = 0.75, cmap = ListedColormap(('red', 'green')))
      8 plt.xlim(X1.min(), X1.max())

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
    627             The predicted classes.
    628         """
--> 629         proba = self.predict_proba(X)
    630 
    631         if self.n_outputs_ == 1:

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
    671         check_is_fitted(self)
    672         # Check data
--> 673         X = self._validate_X_predict(X)
    674 
    675         # Assign chunk of trees to jobs

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
    419         check_is_fitted(self)
    420 
--> 421         return self.estimators_[0]._validate_X_predict(X, check_input=True)
    422 
    423     @property

~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
    394         n_features = X.shape[1]
    395         if self.n_features_ != n_features:
--> 396             raise ValueError("Number of features of the model must "
    397                              "match the input. Model n_features is %s and "
    398                              "input n_features is %s "

ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2

完整型号代码：https://github.com/anandsinha07/Placement-prediction-system-using-ML-algos/blob/master/PREDICTION-Random%20Forest%20Classification/random_forest_classification.py

【问题讨论】：

您的模型 (classifier) 被训练为在每个 X 输入中有 11 个数字。但是你喂它2个数字。 IE。您的预测数组np.array([X1.ravel(), X2.ravel()]).T 只有两列，但应该有 11 列。
如果您提供您的模型代码，我们可以调查问题。
或者，您可以创建 11 列，方法是使用与上述相同但 11 个 X，如 X1、X2、X3...X11，更好地作为原因数组。
@Arty 请从这里查看完整的型号代码 --> github.com/anandsinha07/…
是的，在您的代码中，您正在训练模型以按第 1-11 列预测第 12 列。因此，在代码的最后一部分中，当您进行可视化和预测时（当您遇到异常时），您只提供了两列 X1、X2，但需要提供 11 列。

标签： python numpy machine-learning jupyter-notebook data-science

【解决方案1】：

我会按照我理解问题的方式修复您的代码，添加了几行额外的代码。主要问题是您只提供第 1 列和第 2 列进行预测，但预测器需要 11 列 1-11。因此，应该以某种方式填充第 3-11 列。至少你可以用零填充它们。

在我的解决方案中，我按第一列对训练集进行了排序，然后在构建网格网格时，我尝试通过从网格网格中找到值接近 X1 的最近列 1 值来近似预测所需的列 3-11。 IE。我试图找到仅给定第 1 列的第 3-11 列的最佳近似值，这只是不要用零填充第 3-11 列，这也可以做到。

我还评论了#from sklearn.cross_validation import train_test_split 行并将其替换为from sklearn.model_selection import train_test_split，因为第一行使用旧的sklearn 库，在新版本中只有第二行有效，子模块名称已更改。为自己选择此行的正确变体。

# Random Forest Classification

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('finalplacementdata3.csv')
X = dataset.iloc[:, range(1, 12)].values
y = dataset.iloc[:, 12].values

siX = np.lexsort((X[:, 1], X[:, 0]))
sX, sy = X[siX], y[siX]

# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
                     
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

【讨论】：