【发布时间】:2020-05-07 03:01:36
【问题描述】:
我有一个熊猫数据框,
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
['Train','2019-01-06T19:44:09Z'],
['Train','2019-01-02T19:44:09Z'],
['Car','2019-01-08T06:44:09Z'],
['Car','2019-01-06T18:44:09Z'],
['Train','2019-01-04T19:44:09Z'],
['Car','2019-01-05T16:34:09Z'],
['Train','2019-01-08T19:44:09Z'],
['Car','2019-01-07T14:44:09Z'],
['Car','2019-01-06T11:44:09Z'],
['Train','2019-01-10T19:44:09Z'],
],
columns=['Type', 'Date'])
在按日期排序后,需要找出每种类型的连续日期之间的差异
最终数据看起来像
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
['Train','2019-01-06T19:44:09Z',4],
['Train','2019-01-02T19:44:09Z',0],
['Car','2019-01-08T06:44:09Z',3],
['Car','2019-01-06T18:44:09Z',1],
['Train','2019-01-04T19:44:09Z',2],
['Car','2019-01-05T16:34:09Z',0],
['Train','2019-01-08T19:44:09Z',6],
['Car','2019-01-07T14:44:09Z',2],
['Car','2019-01-06T11:44:09Z',1],
['Train','2019-01-10T19:44:09Z',8],
],
columns=['Type', 'Date','diff'])
这里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以差异从 0 开始,然后第二个日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44 :09Z,所以 diff 是 1 天(这里不确定是否可以包括时间)等等.. 对于 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 然后是 2019-01-04T19:44:09Z 所以 2 天 diff
我尝试了 groupby,但不确定如何包含按日期排序
data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')
【问题讨论】: