【发布时间】:2021-03-06 15:38:49
【问题描述】:
我正在构建从 2018-01-01 00:00:00 开始到今天结束的生存分析数据框架。对于与 ID 相关联的事件,我有两列 only 的开始和结束时间。
但是,我需要添加时间 between 事件未观察到
我在这里展示我所拥有的:
+--------+-----+-----+---------+------ ---------------+ |状态 | ID1 | ID2 |开始时间 |结束时间 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-04 04:00:00 | 2019-12-04 19:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-08 06:30:00 | 2019-12-20 10:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-22 11:00:00 | 2019-12-22 23:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-26 08:00:00 | 2019-12-29 16:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-19 08:00:00 | 2018-09-20 04:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-25 16:30:00 | 2018-09-26 23:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-27 01:30:00 | 2018-09-27 10:30:00 | +--------+-----+-----+---------+------ ---------------+而我需要的是:
+--------+-----+-----+---------+------ ---------------+ |状态 | ID1 | ID2 |开始时间 |结束时间 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2018-01-01 00:00:00 | 2019-12-04 04:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-04 04:00:00 | 2019-12-04 19:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-04 19:30:00 | 2019-12-08 06:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-08 06:30:00 | 2019-12-20 10:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-20 10:00:00 | 2019-12-22 11:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-22 11:00:00 | 2019-12-22 23:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-22 23:00:00 | 2019-12-26 08:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-26 08:00:00 | 2019-12-29 16:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 111 | AA1 | 2019-12-29 16:30:00 |今天 | +--------+-----+-----+---------+------ ---------------+ |状态1 | 112 | AA1 | 2018-01-01 00:00:00 | 2018-09-19 08:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-19 08:00:00 | 2018-09-20 04:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA1 | 2018-09-20 04:30:00 | 2018-09-25 16:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-25 16:30:00 | 2018-09-26 23:00:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA1 | 2018-09-26 23:00:00 | 2018-09-27 01:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-27 01:30:00 | 2018-09-27 10:30:00 | +--------+-----+-----+---------+------ ---------------+ |状态2 | 112 | AA2 | 2018-09-27 10:30:00 |今天 | +--------+-----+-----+---------+------ ---------------+我已经尝试过这段代码(借自:How to find the start time and end time of an event in python?),但它只给了我事件的顺序,而不是所需的行和@Fredy Montaño 提供的答案(下):
fill_date = []
for item in range(1,df.shape[0],1):
if (df['End_Time'][item-1] - df['Start_Time'][item]) == 0:
""
else:
fill_date.append([df["State"][item-1], df["ID1"][item-1], df["ID2"][item-1], df['End_Time'][item-1],df['Start_Time'][item]])
df_add = pd.DataFrame(fill_date)
df_add.columns = ["State", "ID1", "ID2", 'Start_Time', 'End_Time']
df_output = pd.concat([df[["State", "ID1", "ID2", "Start_Time", "End_Time"]], df_add],axis = 0)
df_output = df_output.sort_values(["State", "ID2", "Start_Time"], ascending=True)
我认为我必须对 STATE、ID1 和 ID2 变量设置一个条件,以免占用前一组的时间。
有什么建议吗?
【问题讨论】:
-
到目前为止你写的python在哪里?
-
我把代码放在上面了!
-
该代码无法运行。请提供我们自己测试所需的一切。
-
代码已更新。我想我必须对分类变量 STATE、ID1 和 ID2 设置条件