【发布时间】:2021-07-05 18:15:54
【问题描述】:
我有一个车辆的数据框,我有车辆的开始时间和结束时间,它看起来像下面的数据框
Vehicle Start Finish Time
abc123 2021-07-05 12:17:59.567999 2021-07-06 09:17:59.496001 5.0
abc123 2021-07-06 09:17:59.532000 2021-07-07 06:17:59.460000 5.0
abc123 2021-07-07 06:17:59.496001 2021-07-07 11:17:59.423999 5.0
abc123 2021-07-07 11:17:59.460001 2021-07-08 08:17:59.388000 5.0
abc123 2021-07-08 08:17:59.423999 2021-07-08 13:17:59.352000 5.0
abc123 2021-07-08 13:17:59.387999 2021-07-09 10:17:59.316000 5.0
abc123 2021-07-09 10:17:59.352000 2021-07-10 07:17:59.280000 5.0
abc123 2021-07-10 07:17:59.316000 2021-07-10 12:17:59.244000 5.0
abc123 2021-07-12 06:00:00.035999 2021-08-10 08:47:23.963999 202.79
abc123 2021-08-16 08:47:23.928000 2021-08-17 09:32:23.856000 8.75
abc123 2021-08-14 06:47:23.964000 2021-08-16 08:47:23.892000 10.0
从这个数据框,我想创建以下数据框(或预期的输出)
Vehicle Start Finish Time
abc123 2021-07-05 12:17:59.567999 2021-07-06 09:17:59.496001 5.0
abc123 2021-07-06 09:17:59.532000 2021-07-07 06:17:59.460000 5.0
abc123 2021-07-07 06:17:59.496001 2021-07-07 11:17:59.423999 5.0
abc123 2021-07-07 11:17:59.460001 2021-07-08 08:17:59.388000 5.0
abc123 2021-07-08 08:17:59.423999 2021-07-08 13:17:59.352000 5.0
abc123 2021-07-08 13:17:59.387999 2021-07-09 10:17:59.316000 5.0
abc123 2021-07-09 10:17:59.352000 2021-07-10 07:17:59.280000 5.0
abc123 2021-07-10 07:17:59.316000 2021-07-10 12:17:59.244000 5.0
abc123 2021-07-12 06:00:00.035999 2021-07-31 00:00:23.963999 139
abc123 2021-08-1 06:00:00 2021-08-10 08:47:23.963999 63
abc123 2021-08-16 08:47:23.928000 2021-08-17 09:32:23.856000 8.75
abc123 2021-08-14 06:47:23.964000 2021-08-16 08:47:23.892000 10.0
时间计算是根据天数202.79的大致比例。可能某些车辆连续使用了三个月,然后我想创建 三 行,完成日期将基于月份30/31。我已经尝试了以下基于
的代码How to split pandas dataframe single row into two rows?
df源代码:
data = [['abc123', '2021-07-05 12:17:59.567999', '2021-07-06 09:17:59.496001', 5.0],
['abc123', '2021-07-06 09:17:59.532000', '2021-07-07 06:17:59.460000', 5.0],
['abc123', '2021-07-07 06:17:59.496001', '2021-07-07 11:17:59.423999', 5.0],
['abc123', '2021-07-07 11:17:59.460001', '2021-07-08 08:17:59.388000', 5.0],
['abc123', '2021-07-08 08:17:59.423999', '2021-07-08 13:17:59.352000', 5.0],
['abc123', '2021-07-08 13:17:59.387999', '2021-07-09 10:17:59.316000', 5.0],
['abc123', '2021-07-09 10:17:59.352000', '2021-07-10 07:17:59.280000', 5.0],
['abc123', '2021-07-10 07:17:59.316000', '2021-07-10 12:17:59.244000', 5.0],
['abc123', '2021-07-12 06:00:00.035999', '2021-08-10 08:47:23.963999', 202.79],
['abc123', '2021-08-16 08:47:23.928000', '2021-08-17 09:32:23.856000', 8.75],
['abc123', '2021-08-14 06:47:23.964000', '2021-08-16 08:47:23.892000', 10.0]]
df = pd.DataFrame(data, columns = ['Vehicle', 'Start', 'Finish', 'Time']) \
.astype({'Start': 'datetime64', 'Finish': 'datetime64'})
我这样做了,代码是,
def splitMultiDayRows(df):
mask = df['Finish'].dt.month > df['Start'].dt.month
if np.any(mask):
df_new = df.loc[mask]
df_new['last_date'] = df_new['Start'] + pd.offsets.MonthEnd()
df_new['inter_time'] = df_new['Finish'] - df_new['Start']
df_new['inter_time1'] = df_new['last_date'] - df_new['Start']
df_new['inter_time2'] = df_new['Finish'] - df_new['last_date']
df_new['new_date'] = df_new['last_date'] + dt.timedelta(days=1)
#df_new.drop(['last_date'], axis = 1, inplace = True)
df1 = df_new[['Vehicle', 'Start', 'last_date', 'Time', 'inter_time1', 'inter_time']]
df2 = df_new[['Vehicle', 'new_date', 'Finish', 'Time', 'inter_time2', 'inter_time']]
df2.columns = ['Vehicle', 'Start', 'last_date', 'Time', 'inter_time1', 'inter_time']
df_Temp = pd.concat([df1, df2], axis = 0)
df_Temp['Time'] = (df_Temp['inter_time1']/ df_Temp['inter_time']) * df_Temp['Time']
df_Temp.drop(['inter_time1', 'inter_time'], axis = 1, inplace = True)
df_Temp.columns = ['Vehicle', 'Start', 'Finish', 'Time']
return pd.concat([df,splitMultiDayRows(df_Temp)])
else:
return df
df4 = splitMultiDayRows(df).sort_values(['Start']).reset_index(drop=True)
输出是,
Vehicle Start Finish Time
0 abc123 2021-07-05 12:17:59.567999 2021-07-06 09:17:59.496001 5.0
1 abc123 2021-07-06 09:17:59.532000 2021-07-07 06:17:59.460000 5.0
2 abc123 2021-07-07 06:17:59.496001 2021-07-07 11:17:59.423999 5.0
3 abc123 2021-07-07 11:17:59.460001 2021-07-08 08:17:59.388000 5.0
4 abc123 2021-07-08 08:17:59.423999 2021-07-08 13:17:59.352000 5.0
5 abc123 2021-07-08 13:17:59.387999 2021-07-09 10:17:59.316000 5.0
6 abc123 2021-07-09 10:17:59.352000 2021-07-10 07:17:59.280000 5.0
7 abc123 2021-07-10 07:17:59.316000 2021-07-10 12:17:59.244000 5.0
9 abc123 2021-07-12 06:00:00.035999 2021-07-31 06:00:00.035999 132.33194900705357
10 abc123 2021-08-01 06:00:00.035999 2021-08-10 08:47:23.963999 70.4580509929464
11 abc123 2021-08-14 06:47:23.964000 2021-08-16 08:47:23.892000 10.0
12 abc123 2021-08-16 08:47:23.928000 2021-08-17 09:32:23.856000 8.75
有没有其他方法可以解决?
【问题讨论】:
-
我不明白为什么(对于 139)
2021-07-12 06:00:00.035999->2021-07-31 00:00:23.963999而不是2021-07-12 06:00:00.035999->2021-07-31 23:59:59.999999?相同(对于 63)2021-08-1 06:00:00->2021-08-10 08:47:23.963999而不是2021-08-01 00:00:00.000000->2021-08-10 08:47:23.963999。2021-07-31 00:00:23.963999和2021-08-1 06:00:00之间的时间在哪里(1 天 05:59:36.036001)? -
@Corralien Sir 基本上如果行的开始时间和结束时间不在同一个月,我想拆分行。对于 202.79,结束时间月为 8,开始时间月为 7,这就是我选择
df['Finish'].dt.month > df['Start'].dt.month的原因。可以在2021-07-31 00:00:23.963999 and 2021-08-1 06:00:00之间花费任何时间。当我试图大致拆分它时,如果有人可以写2021-07-31 00:00:00 (finishing time for splitting) and 2021-08-1 06:00:00 (starting time)
标签: python pandas dataframe datetime