【发布时间】:2018-04-26 16:09:27
【问题描述】:
给定以下数据集DF:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
我想一分为二:
df1:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
df2:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
拆分应该基于 Op.progressPercentage 属性,该属性可以假定值从 1 到 100。
当我尝试应用splitting a pandas Dataframe 提供的解决方案时,如下所示,我没有得到正确的预期结果。
df_dataset = pd.read_csv(filepath) #your input data saved here
wash_list = []
shifted = df_dataset['Op.progressPercentage'].shift()
m = shifted.diff(-1).ne(0) & shifted.eq(100)
a = m.cumsum()
aa = df_dataset.groupby([df_dataset.uuid,a])
for k, gp in aa:
wash_list.append(gp.sort_values(['uuid', 'eventTime'], ascending=[1, 1]))
for wash in wash_list :
print("")
print(wash.to_string())
print("")
请,任何帮助将不胜感激。 非常感谢您, 最好的祝福, 卡罗
【问题讨论】:
-
那么,所有值递增的行都在不同的组中?
-
是的。我们可以认为这是一个普遍有效的规则。但是,也可能发生 Op.progressPercentage 并不总是有序的(例如,3.0 的行也可能在 4.o 的行之后)。
-
那么,拆分的逻辑是什么?
-
在这种情况下,我应该首先认识到异常,纠正它,然后应用我们上面讨论的一般规则。
-
嗯,我真的不认为这是可能的。例如,您何时确定您是否遇到了异常情况或合法序列的结尾?你看?我不认为它可以做到。