【问题标题】:calculating amount total and frequency of transactions before a certain date column计算某个日期列之前的总金额和交易频率
【发布时间】:2021-03-27 00:21:45
【问题描述】:

我要计算:

  1. 每位客户在订阅前交易的非空月数(频率)
  2. 某一日期前的总交易金额栏(货币)

初始数据帧

ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark','Fay','Stones'],
    'subscribe_date':['2020-11-30','2020-07-01','2020-01-02','2020-07-01','2020-11-30','2020-11-30','2020-11-30',
                     '2020-01-02','2020-07-01'],
    'trx_month':['2020-12-01','2020-07-01','2020-07-01','2021-03-01','2021-02-01','2020-09-01','2020-11-01',
               '2020-08-01','2018-02-01'],
    'trx_amount':[100,90,50,45,20,30,50,80,200],
    }
ad = pd.DataFrame(ad)
ad = ad.sort_values(by=['customer','trx_month'])

预期的数据帧(之前)

  ad2 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,np.NaN], # number of months the customers transacted before the subscribe_date
      'monetary':[80,200,np.NaN]} #sum of trx_amount before the subscribe_date
ad2 = pd.DataFrame(ad2)
ad2

ad3 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,2], # number of months the customers transacted before the subscribe_date
      'monetary':[120,45,130]} #sum of trx_amount before the subscribe_date
ad3 = pd.DataFrame(ad3)
ad3

说明: Clark 于 2020 年 11 月 30 日订阅。在订阅之前,他已在 2020 年 9 月和 11 月(频率 = 2)进行了交易,这些交易的总和为 80。 订阅后,他在2020年12月和2021年2月再次交易(频率=2,货币=120)

在不考虑订阅日期的情况下,使用pandas groupby可以计算频率和货币,但是有了新的限制,我很困惑。

如果代码可以灵活地调整到订阅后(比较前后效果),那就太好了。

【问题讨论】:

    标签: python python-3.x pandas


    【解决方案1】:

    IIUC,您可以根据订阅日期和交易日期的差异分配一个条件列,然后分组:


    将日期列从字符串转换为日期时间(如果已经是日期,则忽略此块

    ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
    ad['trx_month'] = pd.to_datetime(ad['trx_month'])
    

    然后使用:

    d = {'count':'frequency','sum':'monetary'}
    diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days
    
    out = (ad.assign(Before_After=
    np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
    .groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
    .rename(columns=d))
    

    print(out)
                              frequency  monetary
    customer Before_After                        
    Clark    After                    2       120
             Before                   2        80
    Fay      After                    2       130
    Stones   After                    1        45
             Before                   1       200
             Subscribed_date          1        90
    

    编辑:根据您的编辑,您可以创建一个以 BeforeAfter 为键的字典,并将相应的数据框作为值

    d = {'count':'frequency','sum':'monetary'}
    diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days
    
    out = (ad.assign(Before_After=
    np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
    .groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
    .rename(columns=d)).unstack().swaplevel(axis=1)
    final_dict = {i: out.loc[:,i] for i in out.columns.levels[0]}
    
    print(final_dict['Before'],'\n\n',final_dict["After"])
    
              frequency  monetary
    customer                     
    Clark           2.0      80.0
    Fay             NaN       NaN
    Stones          1.0     200.0 
    
               frequency  monetary
    customer                     
    Clark           2.0     120.0
    Fay             2.0     130.0
    Stones          1.0      45.0
    

    【讨论】:

    • 您介意看看我们这里遇到的类似问题吗?在上面的问题中,我想获取自订阅日期可用的最早数据以来的频率和货币值。这一次,我想获取从订阅日期前 7 天到订阅日期的频率和货币值。此处描述:stackoverflow.com/questions/67312625/…
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-04
    • 2021-01-23
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多