【问题标题】:Pandas - convert cumulative value to actual valuePandas - 将累积值转换为实际值
【发布时间】:2017-10-16 20:25:22
【问题描述】:

假设我的数据框看起来像这样:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0

最后的count 列是累积计数。 我需要做的是找到特定的实际计数 date+site+country+kind+ID 元组,这将导致:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0

我知道这将涉及到groupby 电话,但我不知道除此之外还能做什么。假设元组的第一个实例的计数为0。 任何帮助都会很棒。谢谢

【问题讨论】:

    标签: python pandas group-by pandas-groupby


    【解决方案1】:

    使用groupby + diffcumsum 的倒数。

    cols = ['site', 'country_code', 'kind', 'ID']
    df['count'] = df.groupby(cols)['count'].diff().fillna(0)
    
    print(df['count'])
    0     0.0
    1     0.0
    2     0.0
    3     1.0
    4     0.0
    5     0.0
    6     0.0
    7     0.0
    8     3.0
    9     0.0
    10    3.0
    11    2.0
    Name: count, dtype: float64
    

    感谢 MaxU 指出错误!

    【讨论】:

    • 谢谢,但这将导致元组 (2017-02-15,website2,AU,1,91) 的值为 467 而它应该是 0
    • 我认为 OP 想要的是:df.groupby('site')['count'].diff().fillna(0)
    • @Craig 在 MaxU 的帮助下,我能够弄清楚你想要什么。查看我的编辑。
    • 非常感谢@coldspeed - 我最终不得不按[site,country_code,kind,ID] 分组,但这正是我想要的!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-11-23
    • 2020-10-07
    • 1970-01-01
    • 2015-03-23
    • 2016-08-27
    • 1970-01-01
    相关资源
    最近更新 更多