【发布时间】:2020-09-24 13:31:20
【问题描述】:
我正在尝试使用另一个提供的时间戳来计算一个数据帧中两个时间戳之间的行数。这是两个模拟数据框,类似于我正在使用的:-
数据框 1
#df1===========================================================
UserId1<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e",
"f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e")
Status1<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Answered", "Answered")
DateTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396,
1548824737, 1548824927, 1548825554, 1548825793, 1576821965, 1576821999,
1576822013, 1576822152, 1576822484, 1576865566, 1576926050, 1577037551,
1560000877, 1560001005, 1560013996, 1560014372, 1560186676
), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
df1<-data.frame(UserId1,Status1,DateTime)
df1$DateTime<-as.numeric(df1$DateTime)
df1$DateTime<-as.POSIXct(df1$DateTime, origin = "1970-01-01 00:00:00")
colnames(df1)<-c("UserId","Status","DateTime")
View(df1)
View(df1)
数据框 2
UserId2<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e"
)
OrigTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396,
1576821965, 1576865566, 1576926050, 1577037551, 1560000877, 1560013996,
1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
LastTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548825793,
1576822484, 1576865566, 1576926050, 1577037551, 1560001005, 1560014372,
1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
Status<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Answered", "Answered")
calls<-c(1, 1, 1, 1, 6, 6, 1, 1, 1, 3, 4, 1)
df2<-data.frame(UserId2,OrigTime,LastTime,Status,calls)
df2$OrigTime<-as.numeric(df2$OrigTime)
df2$OrigTime<-as.POSIXct(df2$OrigTime, origin = "1970-01-01 00:00:00")
df2$LastTime<-as.numeric(df2$LastTime)
df2$LastTime<-as.POSIXct(df2$LastTime, origin = "1970-01-01 00:00:00")
colnames(df2)<-c("UserId","OrigTime","LastTime","Status","calls")
View(df2)
这是两个数据帧的各自输出:-
#df1
UserId Status DateTime
1 9f649f366edf Abandoned 2019-01-21 00:05:15
2 9f649f366edf Abandoned 2019-01-21 01:52:40
3 9f649f366edf Answered 2019-01-21 19:44:18
4 9f649f366edf Answered 2019-01-21 19:46:36
5 9f649f366edf Abandoned 2019-01-30 04:59:56
6 9f649f366edf Abandoned 2019-01-30 05:05:37
7 9f649f366edf Abandoned 2019-01-30 05:08:47
8 9f649f366edf Abandoned 2019-01-30 05:19:14
9 9f649f366edf Abandoned 2019-01-30 05:23:13
10 6bc397bdc516 Abandoned 2019-12-20 06:06:05
11 6bc397bdc516 Abandoned 2019-12-20 06:06:39
12 6bc397bdc516 Abandoned 2019-12-20 06:06:53
13 6bc397bdc516 Abandoned 2019-12-20 06:09:12
14 6bc397bdc516 Abandoned 2019-12-20 06:14:44
15 6bc397bdc516 Abandoned 2019-12-20 18:12:46
16 6bc397bdc516 Abandoned 2019-12-21 11:00:50
17 6bc397bdc516 Abandoned 2019-12-22 17:59:11
18 f24cbff4c81e Abandoned 2019-06-08 14:34:37
19 f24cbff4c81e Abandoned 2019-06-08 14:36:45
20 f24cbff4c81e Abandoned 2019-06-08 18:13:16
21 f24cbff4c81e Answered 2019-06-08 18:19:32
22 f24cbff4c81e Answered 2019-06-10 18:11:16
#df2
UserId OrigTime LastTime Status calls
1 9f649f366edf 2019-01-21 00:05:15 2019-01-21 00:05:15 Abandoned 1
2 9f649f366edf 2019-01-21 01:52:40 2019-01-21 01:52:40 Abandoned 1
3 9f649f366edf 2019-01-21 19:44:18 2019-01-21 19:44:18 Answered 1
4 9f649f366edf 2019-01-21 19:46:36 2019-01-21 19:46:36 Answered 1
5 9f649f366edf 2019-01-30 04:59:56 2019-01-30 05:23:13 Abandoned 6
6 6bc397bdc516 2019-12-20 06:06:05 2019-12-20 06:14:44 Abandoned 6
7 6bc397bdc516 2019-12-20 18:12:46 2019-12-20 18:12:46 Abandoned 1
8 6bc397bdc516 2019-12-21 11:00:50 2019-12-21 11:00:50 Abandoned 1
9 6bc397bdc516 2019-12-22 17:59:11 2019-12-22 17:59:11 Abandoned 1
10 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 3
11 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 4
12 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
我正在尝试计算 df1$DateTime 中出现在 df2 中 OrigTime 和 LastTime 列之间的行,根据 UserId,因为 df2 中的 calls 列对于某些条目是错误的(a "call" 是 df1) 中的一行。
这是我希望 df2 看起来像的输出示例;这是以前:-
以UserId== "f24cbff4c81e"为例;此用户在 df1 中只有 5 行(调用),但如果您在 df2 中统计调用,则此用户有 8 个。请参阅前后以下内容:-
之前
df2%>%filter(UserId=="f24cbff4c81e")
UserId OrigTime LastTime Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 3
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 4
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
调用列错误,因为 df1$DateTime 内 OrigTime 和 LastTime 之间的行数不多。这是我想要的正确结果:-
正确的结果
df2%>%filter(UserId=="f24cbff4c81e")
UserId OrigTime LastTime Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 2
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 2
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
对于UserId=="f24cbff4c81e",在 df1 中的 2019-06-08 14:34:37(df2 中的 OrigTime)和 2019-06-08 14:36:45(df2 中的 LastTime)之间有 2 个调用,
df1 中 2019-06-08 18:13:16(df2 中的 OrigTime)和 2019-06-08 18:19:32(df2 中的 LastTime)之间的 2 个调用,以及 2019-06-10 18 发生的 1 个调用: 11:16(OrigTime 和 LastTime;如果可能的话,我想在此处保留时间戳)在 df1 中。我希望你能看到我试图为其他用户实现的逻辑。
总而言之,如果 df1 中时间戳之间的行数(由 df2 中的 OrigTime 和 LastTime 指定)与 df2 中相应行中的 calls 值不匹配,我希望将其更改为正确的值。非常感谢任何帮助:)
【问题讨论】: