【问题标题】:Difference between dates in Pandas dataframePandas 数据框中的日期之间的差异
【发布时间】:2018-03-29 14:58:13
【问题描述】:

这是related to this question,但现在我需要找出存储在“YYYY-MM-DD”中的日期之间的差异。本质上,count 列中的值之间的差异是我们所需要的,但会根据每行之间的天数进行归一化。

我的数据框是:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0

我想在按date+site+country+kind+ID 元组分组后找出每个日期之间的差异。

[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1]

一种选择是使用pd.to_datetime()date 列转换为Pandas datetime 并使用diff 函数,但这会产生timetelda64 类型的“x days”值。我想使用这个差异来找到每日平均计数,所以如果这可以通过一个/不那么痛苦的步骤来完成,那会很好。

【问题讨论】:

    标签: python pandas datetime dataframe pandas-groupby


    【解决方案1】:

    你可以使用.dt.days访问器:

    In [72]: df['date'] = pd.to_datetime(df['date'])
    
    In [73]: df['day_diff'] = df.groupby(['site','country_code','kind','ID'])['date'] \
                                .diff().dt.days.fillna(0)
    
    In [74]: df
    Out[74]:
             date      site country_code  kind  ID  rank  votes  sessions  avg_score  count  day_diff
    0  2017-03-20  website1           US     0  84   226    0.0      15.0   3.370812   53.0       0.0
    1  2017-03-21  website1           US     0  84   214    0.0      15.0   3.370812   53.0       1.0
    2  2017-03-22  website1           US     0  84   226    0.0      16.0   3.370812   53.0       1.0
    3  2017-03-23  website1           US     0  84   234    0.0      16.0   3.369048   54.0       1.0
    4  2017-03-24  website1           US     0  84   226    0.0      16.0   3.369048   54.0       1.0
    5  2017-03-25  website1           US     0  84   212    0.0      16.0   3.369048   54.0       1.0
    6  2017-03-27  website1           US     0  84   228    0.0      16.0   3.369048   58.0       2.0
    7  2017-02-15  website2           AU     1  91   144    4.0     148.0   4.727272  521.0       0.0
    8  2017-02-16  website2           AU     1  91   144    3.0     147.0   4.727272  524.0       1.0
    9  2017-02-20  website2           AU     1  91   100    4.0     148.0   4.727272  531.0       4.0
    10 2017-02-21  website2           AU     1  91   118    6.0     149.0   4.727272  533.0       1.0
    11 2017-02-22  website2           AU     1  91   114    4.0     151.0   4.727272  534.0       1.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-10-01
      • 2021-10-06
      • 1970-01-01
      • 2021-01-14
      • 2020-11-15
      • 1970-01-01
      • 2011-02-01
      • 2019-05-17
      相关资源
      最近更新 更多