基于时间戳“大小”在 R 中创建带有时间戳的整洁数据答案

【问题标题】：Create tidy data in R with time stamps based on time stap "size"基于时间戳“大小”在 R 中创建带有时间戳的整洁数据
【发布时间】：2019-03-28 09:37:11
【问题描述】：

我正在对与我们生产的不同产品相关的每个流程的周期时间变化进行分析。我们的 SAP 数据包含工人的开始和结束日志条目，目标是使用此信息计算周期时间。

但是，SAP 将开始和结束时间戳导出在一列中，并且没有可用的参考列说明什么是开始时间和什么是结束时间。这使得使用例如整理数据传播是不可能的。

当前数据

160 万行
150 次操作
万单

。一小部分数据样本如下所示。

Order <-  rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")

df_current <- cbind(Order, Operation, Timestamp)

每个流程步骤（“操作”）都需要此开始和结束信息。逻辑上最早的时间戳是开始日志条目，而最新的时间戳是完成日志条目。

但是我不知道如何告诉 R 创建一个新列，该列根据时间戳正确指示哪个时间戳是开始和结束的。

所需数据

Order <-  rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")
Status <- c("Start" , "Finish", "Start" , "Finish", "Start" , "Finish",  "Finish", "Start")   

df_desired <- cbind(Order, Operation, Timestamp, Status)

当数据看起来像这样时，我可以轻松整理数据。

谢谢

【问题讨论】：

标签： r sap

【解决方案1】：

假设您可以将数据转换为 data.frame 而不是 matrix：

df_current <- data.frame(Order, Operation, Timestamp)

df.With.Status <- do.call(rbind, #rbind the dataframes to a big dataframe
lapply(split(df_current,list(df_current$Order,df_current$Operation)), #split dataframe by unique order/operation combinations and apply function for each combination
       function(df){
         df$Timestamp <- strptime(rev(as.character(df$Timestamp)),format="%d-%m-%Y %H:%M:%S") #Convert to Time, so that it is sortable
         df <- df[order(df$Timestamp),] # rearrange the dataframe in case of wrong order
         df$Status <- c("Start","Finish") #add status
         return(df)
       }))

【讨论】：

Uni Jena.. 不错，大战争 ich auch。
非常感谢朱利安的意见！该代码完全符合我的想法。由于您的代码，我已经成功地分析了我的第一批数据。 :) 干杯！

【解决方案2】：

dplyr


library(dplyr)


  df_current %>% as.data.frame() %>%
    group_by(Operation, Order) %>%
    mutate(Timestamp = as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S"),
           Status = case_when(Timestamp == min(Timestamp) ~ "Start",
                              TRUE ~ "Finish")) %>%
    arrange(Order, Operation)


# A tibble: 8 x 4
# Groups:   Operation, Order [4]
  Order   Operation Timestamp           Status
  <fct>   <fct>     <dttm>              <chr> 
1 1059866 1510      2016-11-30 07:33:30 Start 
2 1059866 1510      2016-11-30 16:00:13 Finish
3 1059866 1550      2016-11-30 07:33:42 Start 
4 1059866 1550      2016-11-30 16:00:18 Finish
5 1059891 1510      2016-11-30 07:35:21 Start 
6 1059891 1510      2016-11-30 16:00:43 Finish
7 1059891 1550      2016-11-30 07:35:43 Start 
8 1059891 1550      2016-11-30 16:00:39 Finish

另外，由于您的数据很大：data.table

library(data.table)

dfc_2 <- as.data.frame(df_current)

dfc_2$Timestamp <- as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S")

setDT(df_curr)[, Status := ifelse(Timestamp == min(Timestamp), "Start", "Finish"),
               keyby = .(Operation, Order)]

【讨论】：

非常感谢您惊人的响应速度和代码。我喜欢代码中的条件方法，但由于某种原因我无法让它工作......我在列状态中得到的结果是第一行的“开始”和所有其他“完成”。我还没有弄清楚我做错了什么，但我仍然很困惑。 :)