将数据拟合到 DecisionTreeRegressor 时出现 KeyError答案

【问题标题】：KeyError while fitting data to DecisionTreeRegressor将数据拟合到 DecisionTreeRegressor 时出现 KeyError
【发布时间】：2020-03-23 06:07:44
【问题描述】：

我正在研究一个模型来预测房屋的价格。为了生成模型，我使用 sklearn 的 DecisionTreeRegressor。我将数据拆分为火车并与train_test_split 拆分。但是当我尝试将数据拟合到模型时，我收到以下错误

KeyError                                  Traceback (most recent call last)
<ipython-input-25-f4acd876feae> in <module>
      1 for max_leaf_nodes in [5, 50, 500, 5000]:
----> 2     my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
      3     print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

<ipython-input-21-1a489238552f> in get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup)
      2 
      3     model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
----> 4     model.fit(train_inp, train_oup)
      5     predictions = model.predict(val_inp)
      6     mae = mean_absolute_error(val_oup, predictions)

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1140             sample_weight=sample_weight,
   1141             check_input=check_input,
-> 1142             X_idx_sorted=X_idx_sorted)
   1143         return self
   1144 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    331                                                          self.n_classes_)
    332             else:
--> 333                 criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
    334                                                          n_samples)
    335 

KeyError: 5

这是我的代码

get_mae 函数

def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):

    model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
    model.fit(train_inp, train_oup)
    predictions = model.predict(val_inp)
    mae = mean_absolute_error(val_oup, predictions)

    return mae

读取数据集

df = pd.read_csv('../DATASETS/melb_data.csv')

y = df.Price

features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = df[features]

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

循环寻找最佳叶子节点数

for max_leaf_nodes in [5, 50, 500, 5000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

【问题讨论】：

标签： python scikit-learn decision-tree

【解决方案1】：

由于您没有将关键字参数传递给 DecisionTreeClassifier，因此整数 5 作为参数传递给 'criterio'n 参数。

除非您传递关键字参数，否则第一个参数将传递给标准，第二个参数将传递给拆分器参数，依此类推。然而，criteria 只接受“mse”、“friedman_mse”或“mae”作为参数，因此是 keyError。

请试试这个代码：

def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):    
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    model.fit(train_inp, train_oup)
    predictions = model.predict(val_inp)
    mae = mean_absolute_error(val_oup, predictions)

    return mae

【讨论】：