创建自定义估计器：状态均值估计器答案

【问题标题】：Creating a Custom Estimator: State Mean Estimator创建自定义估计器：状态均值估计器
【发布时间】：2020-02-04 11:03:50
【问题描述】：

我正在尝试开发一个非常简单的初始模型来预测疗养院可能会根据其位置支付的罚款金额。

这是我的班级定义

#initial model to predict the amount of fines a nursing home might expect to pay based on its location
from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin

class GroupMeanEstimator(BaseEstimator, RegressorMixin):
    #defines what a group is by using grouper
    #initialises an empty dictionary for group averages
    def __init__(self, grouper):
        self.grouper = grouper
        self.group_averages = {}

    #Any calculation I require for my predict method goes here
    #Specifically, I want to groupby the group grouper is set by
    #I want to then find out what is the mean penalty by each group
    #X is the data containing the groups
    #Y is fine_totals
    #map each state to its mean fine_tot
    def fit(self, X, y):
        #Use self.group_averages to store the average penalty by group
        Xy = X.join(y) #Joining X&y together
        state_mean_series = Xy.groupby(self.grouper)[y.name].mean() #Creating a series of state:mean penalties
        #populating a dictionary with state:mean key:value pairs
        for row in state_mean_series.iteritems():
            self.group_averages[row[0]] = row[1]
        return self

    #The amount of fine an observation is likely to receive is based on his group mean
    #Want to first populate the list with the number of observations
    #For each observation in the list, what is his group and then set the likely fine to his group mean.
    #Return the list
    def predict(self, X):
        dictionary = self.group_averages
        group = self.grouper
        list_of_predictions = [] #initialising a list to store our return values
        for row in X.itertuples(): #iterating through each row in X
            prediction = dictionary[row.STATE] #Getting the value from group_averages dict using key row.group
            list_of_predictions.append(prediction)
        return list_of_predictions

它适用于这个 state_model.predict(data.sample(5))

但是当我尝试这样做时崩溃了： state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))

我的模型无法处理这种可能性，我想寻求帮助以纠正它。

【问题讨论】：

这里可能有几个问题，您可能在group_averages 上的索引错误，您没有在group_averages 中定义AS 状态，row[0] 是什么在你的 fit 函数中看起来像
如何在group_averages 中定义AS 状态？具体来说，老实说，我不太确定state_model.predict(pd.DataFrame([{'STATE': 'AS'}])) 正在尝试做什么。 Row[0] in fit 是州名。
如果你能把self.group_averages的内容给我看一下，也许我能帮上忙。
snipboard.io/dNiqrY.jpg group_averages 将包含一个字典，将每个状态映射到其总和。
您遇到了什么样的错误？如果你这样做state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))

标签： python pandas machine-learning scikit-learn

【解决方案1】：

我看到的问题出在您的 fit 方法中，iteritems 基本上迭代列而不是行。您应该使用itertuples，它将为您提供行数据。只需将 fit 方法中的循环更改为

for row in pd.DataFrame(state_mean_series).itertuples(): #row format is [STATE, mean_value]
    self.group_averages[row[0]] = row[1]

然后在您的预测方法中，只需通过以下方式进行故障安全检查

prediction = dictionary.get(row.STATE, None) # None is the default value here in case the 'AS' doesn't exist. you may replace it with what ever you want

【讨论】：

嗨，我试过你的方法，而不是在管道步骤生成AttributeError: 'Series' object has no attribute 'itertuples'。添加了指向 Jupyter 文件的链接：drive.google.com/file/d/1NHf7DenPGvhXbMbW7MwadZoVDwH1jLTO/…
@Dumbchimp 我浏览了你的文件，我已经更新了答案。
知道了，非常感谢！你帮了大忙！不仅没有返回任何内容，而且评分者还提供了一个字典列表！
求助，我收到一个错误AttributeError: 'list' object has no attribute 'itertuples'是在提交给评分者时产生的
@JansenSimanullang 你在 for 循环中有 pd.DataFrame(state_mean_series).itertuples() 这一行吗？