【问题标题】:Calculate difference between successive date column with groupby on another column in pandas?计算熊猫中另一列上连续日期列与groupby之间的差异?
【发布时间】:2020-05-07 03:01:36
【问题描述】:

我有一个熊猫数据框,

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
                     ['Train','2019-01-06T19:44:09Z'],
                     ['Train','2019-01-02T19:44:09Z'],
                     ['Car','2019-01-08T06:44:09Z'],
                     ['Car','2019-01-06T18:44:09Z'],
                     ['Train','2019-01-04T19:44:09Z'],
                     ['Car','2019-01-05T16:34:09Z'],
                     ['Train','2019-01-08T19:44:09Z'],
                     ['Car','2019-01-07T14:44:09Z'],
                     ['Car','2019-01-06T11:44:09Z'],
                     ['Train','2019-01-10T19:44:09Z'],
                     ], 
                    columns=['Type', 'Date'])

在按日期排序后,需要找出每种类型的连续日期之间的差异

最终数据看起来像

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
                     ['Train','2019-01-06T19:44:09Z',4],
                     ['Train','2019-01-02T19:44:09Z',0],
                     ['Car','2019-01-08T06:44:09Z',3],
                     ['Car','2019-01-06T18:44:09Z',1],
                     ['Train','2019-01-04T19:44:09Z',2],
                     ['Car','2019-01-05T16:34:09Z',0],
                     ['Train','2019-01-08T19:44:09Z',6],
                     ['Car','2019-01-07T14:44:09Z',2],
                     ['Car','2019-01-06T11:44:09Z',1],
                     ['Train','2019-01-10T19:44:09Z',8],
                     ], 
                    columns=['Type', 'Date','diff'])

这里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以差异从 0 开始,然后第二个日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44 :09Z,所以 diff 是 1 天(这里不确定是否可以包括时间)等等.. 对于 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 然后是 2019-01-04T19:44:09Z 所以 2 天 diff

我尝试了 groupby,但不确定如何包含按日期排序

data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    pandas.DataFrame.groupbydt.date 一起使用:

    df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())
    

    输出:

         Type                      Date   diff
    0     Car 2019-01-06 21:44:09+00:00 1 days
    1   Train 2019-01-06 19:44:09+00:00 4 days
    2   Train 2019-01-02 19:44:09+00:00 0 days
    3     Car 2019-01-08 06:44:09+00:00 3 days
    4     Car 2019-01-06 18:44:09+00:00 1 days
    5   Train 2019-01-04 19:44:09+00:00 2 days
    6     Car 2019-01-05 16:34:09+00:00 0 days
    7   Train 2019-01-08 19:44:09+00:00 6 days
    8     Car 2019-01-07 14:44:09+00:00 2 days
    9     Car 2019-01-06 11:44:09+00:00 1 days
    10  Train 2019-01-10 19:44:09+00:00 8 days
    

    如果您希望他们成为int,请添加dt.days

    df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days
    

    输出:

         Type                      Date  diff
    0     Car 2019-01-06 21:44:09+00:00     1
    1   Train 2019-01-06 19:44:09+00:00     4
    2   Train 2019-01-02 19:44:09+00:00     0
    3     Car 2019-01-08 06:44:09+00:00     3
    4     Car 2019-01-06 18:44:09+00:00     1
    5   Train 2019-01-04 19:44:09+00:00     2
    6     Car 2019-01-05 16:34:09+00:00     0
    7   Train 2019-01-08 19:44:09+00:00     6
    8     Car 2019-01-07 14:44:09+00:00     2
    9     Car 2019-01-06 11:44:09+00:00     1
    10  Train 2019-01-10 19:44:09+00:00     8
    

    【讨论】:

      【解决方案2】:
      • 首先将日期转换为日期到其他列中
      • 使用 lambda 函数减去日期的最小值并使用 dt.days 查找天数
      • 然后删除额外的日期列
      data['Date_date'] = pd.to_datetime(data['Date']).dt.date
      data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
      data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
      print(data)
      
           Type                  Date  diff
      0     Car  2019-01-06T21:44:09Z     1
      1   Train  2019-01-06T19:44:09Z     4
      2   Train  2019-01-02T19:44:09Z     0
      3     Car  2019-01-08T06:44:09Z     3
      4     Car  2019-01-06T18:44:09Z     1
      5   Train  2019-01-04T19:44:09Z     2
      6     Car  2019-01-05T16:34:09Z     0
      7   Train  2019-01-08T19:44:09Z     6
      8     Car  2019-01-07T14:44:09Z     2
      9     Car  2019-01-06T11:44:09Z     1
      10  Train  2019-01-10T19:44:09Z     8
      

      【讨论】:

        【解决方案3】:

        transform直接减法

        s = pd.to_datetime(data['Date']).dt.date
        data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days
        
        Out[36]:
             Type                  Date  diff
        0     Car  2019-01-06T21:44:09Z     1
        1   Train  2019-01-06T19:44:09Z     4
        2   Train  2019-01-02T19:44:09Z     0
        3     Car  2019-01-08T06:44:09Z     3
        4     Car  2019-01-06T18:44:09Z     1
        5   Train  2019-01-04T19:44:09Z     2
        6     Car  2019-01-05T16:34:09Z     0
        7   Train  2019-01-08T19:44:09Z     6
        8     Car  2019-01-07T14:44:09Z     2
        9     Car  2019-01-06T11:44:09Z     1
        10  Train  2019-01-10T19:44:09Z     8
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-11-13
          • 2019-09-25
          • 2019-05-17
          • 2016-12-16
          • 2017-03-21
          • 1970-01-01
          相关资源
          最近更新 更多