【发布时间】:2020-02-04 11:03:50
【问题描述】:
我正在尝试开发一个非常简单的初始模型来预测疗养院可能会根据其位置支付的罚款金额。
这是我的班级定义
#initial model to predict the amount of fines a nursing home might expect to pay based on its location
from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin
class GroupMeanEstimator(BaseEstimator, RegressorMixin):
#defines what a group is by using grouper
#initialises an empty dictionary for group averages
def __init__(self, grouper):
self.grouper = grouper
self.group_averages = {}
#Any calculation I require for my predict method goes here
#Specifically, I want to groupby the group grouper is set by
#I want to then find out what is the mean penalty by each group
#X is the data containing the groups
#Y is fine_totals
#map each state to its mean fine_tot
def fit(self, X, y):
#Use self.group_averages to store the average penalty by group
Xy = X.join(y) #Joining X&y together
state_mean_series = Xy.groupby(self.grouper)[y.name].mean() #Creating a series of state:mean penalties
#populating a dictionary with state:mean key:value pairs
for row in state_mean_series.iteritems():
self.group_averages[row[0]] = row[1]
return self
#The amount of fine an observation is likely to receive is based on his group mean
#Want to first populate the list with the number of observations
#For each observation in the list, what is his group and then set the likely fine to his group mean.
#Return the list
def predict(self, X):
dictionary = self.group_averages
group = self.grouper
list_of_predictions = [] #initialising a list to store our return values
for row in X.itertuples(): #iterating through each row in X
prediction = dictionary[row.STATE] #Getting the value from group_averages dict using key row.group
list_of_predictions.append(prediction)
return list_of_predictions
它适用于这个
state_model.predict(data.sample(5))
但是当我尝试这样做时崩溃了:
state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))
我的模型无法处理这种可能性,我想寻求帮助以纠正它。
【问题讨论】:
-
这里可能有几个问题,您可能在
group_averages上的索引错误,您没有在group_averages中定义AS状态,row[0]是什么在你的 fit 函数中看起来像 -
如何在
group_averages中定义AS状态?具体来说,老实说,我不太确定state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))正在尝试做什么。 Row[0] in fit 是州名。 -
如果你能把
self.group_averages的内容给我看一下,也许我能帮上忙。 -
snipboard.io/dNiqrY.jpg group_averages 将包含一个字典,将每个状态映射到其总和。
-
您遇到了什么样的错误?如果你这样做
state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))
标签: python pandas machine-learning scikit-learn