【问题标题】:Python pandas create new column with groupby with custom agg functionPython pandas 使用 groupby 和自定义 agg 函数创建新列
【发布时间】:2018-02-25 13:09:15
【问题描述】:

我的数据框:

from random import random, randint
from pandas import DataFrame

t = DataFrame({"metasearch":["A","B","A","B","A","B","A","B"],
                   "market":["A","B","A","B","A","B","A","B"],
                   "bid":[random() for i in range(8)],
                   "clicks": [randint(0,10) for i in range(8)],
                   "country_code":["A","A","A","A","A","B","A","B"]})

我想为每个market 拟合线性回归,所以我:

1) 组 df - groups = t.groupby(by="market")

2) 准备函数以在组上拟合模型 -

from sklearn.linear_model import LinearRegression
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    return lr.coef_[0] # THIS IS A SCALAR

3) 以market 作为索引,coef 作为值创建一个新系列:

s = groups.transform(group_fitter) 

但第三步失败: KeyError: ('bid_cpc', 'occured at index bid')

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    我认为您需要 transform 改用 apply 因为在函数中使用更多列并且对于新列使用 join

    from sklearn.linear_model import LinearRegression
    def group_fitter(group):
        lr = LinearRegression()
        X = group["bid"].fillna(0).values.reshape(-1,1)
        y = group["clicks"].fillna(0)
        lr.fit(X, y)
        return lr.coef_[0] # THIS IS A SCALAR
    
    groups = t.groupby(by="market")
    df = t.join(groups.apply(group_fitter).rename('new'), on='market')
    print (df) 
            bid  clicks country_code market metasearch       new
    0  0.462734       9            A      A          A -8.632301
    1  0.438869       5            A      B          B  6.690289
    2  0.047160       9            A      A          A -8.632301
    3  0.644263       0            A      B          B  6.690289
    4  0.579040       0            A      A          A -8.632301
    5  0.820389       6            B      B          B  6.690289
    6  0.112341       5            A      A          A -8.632301
    7  0.432502       0            B      B          B  6.690289
    

    【讨论】:

      【解决方案2】:

      只需从函数中返回组而不是系数。

      # return the group instead of scaler value
      def group_fitter(group):
          lr = LinearRegression()
          X = group["bid"].fillna(0).values.reshape(-1,1)
          y = group["clicks"].fillna(0)
          lr.fit(X, y)
          group['coefficient'] = lr.coef_[0] # <- This is the changed line
          return group
      
      # the new column gets added to the data 
      s = groups.apply(group_fitter)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-02-10
        • 2019-08-22
        • 1970-01-01
        相关资源
        最近更新 更多