【问题标题】:KeyError while fitting data to DecisionTreeRegressor将数据拟合到 DecisionTreeRegressor 时出现 KeyError
【发布时间】:2020-03-23 06:07:44
【问题描述】:

我正在研究一个模型来预测房屋的价格。为了生成模型,我使用 sklearn 的 DecisionTreeRegressor。我将数据拆分为火车并与train_test_split 拆分。但是当我尝试将数据拟合到模型时,我收到以下错误

KeyError                                  Traceback (most recent call last)
<ipython-input-25-f4acd876feae> in <module>
      1 for max_leaf_nodes in [5, 50, 500, 5000]:
----> 2     my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
      3     print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

<ipython-input-21-1a489238552f> in get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup)
      2 
      3     model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
----> 4     model.fit(train_inp, train_oup)
      5     predictions = model.predict(val_inp)
      6     mae = mean_absolute_error(val_oup, predictions)

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1140             sample_weight=sample_weight,
   1141             check_input=check_input,
-> 1142             X_idx_sorted=X_idx_sorted)
   1143         return self
   1144 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    331                                                          self.n_classes_)
    332             else:
--> 333                 criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
    334                                                          n_samples)
    335 

KeyError: 5

这是我的代码

get_mae 函数

def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):

    model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
    model.fit(train_inp, train_oup)
    predictions = model.predict(val_inp)
    mae = mean_absolute_error(val_oup, predictions)

    return mae

读取数据集

df = pd.read_csv('../DATASETS/melb_data.csv')

y = df.Price

features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = df[features]

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

循环寻找最佳叶子节点数

for max_leaf_nodes in [5, 50, 500, 5000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

【问题讨论】:

    标签: python scikit-learn decision-tree


    【解决方案1】:

    由于您没有将关键字参数传递给 DecisionTreeClassifier,因此整数 5 作为参数传递给 'criterio'n 参数。

    除非您传递关键字参数,否则第一个参数将传递给标准,第二个参数将传递给拆分器参数,依此类推。然而,criteria 只接受“mse”、“friedman_mse”或“mae”作为参数,因此是 keyError。

    请试试这个代码:

    def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):    
        model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
        model.fit(train_inp, train_oup)
        predictions = model.predict(val_inp)
        mae = mean_absolute_error(val_oup, predictions)
    
        return mae
    

    【讨论】:

      猜你喜欢
      • 2018-02-18
      • 2018-10-18
      • 2019-04-28
      • 2020-09-24
      • 2021-08-27
      • 1970-01-01
      • 1970-01-01
      • 2015-09-12
      • 2021-09-17
      相关资源
      最近更新 更多