我认为百分比没有按预期计算的原因是因为根据代码,您根据 2 个相同的值确定百分比,因此比例为 1.0。
我不能完全确定您的问题,但是如果您说“混淆时间列的顺序”时,您的意思是整个 Time 列不正确,那么您可能最好使用lubridate 包来制作您的Time 列。
library(lubridate)
S06 %>%
# first we convert the Timestamp column into datetime format
mutate(
Timestamp = ymd_hms(Timestamp)
) %>%
# then, we can extract the components from the Timestamp
mutate(
date = date(Timestamp),
hour = lubridate::hour(Timestamp),
timestamp_hour = ymd_h(str_c(date, ' ', hour))
) %>%
{. ->> S06_a} # this saves the data as 'S06_a' to use next
那么,如果我理解正确的话,您想确定每小时每种行为类型的观察百分比。
S06_a %>%
# then, work out the total number of observations per hour, context and behaviour
group_by(timestamp_hour, Context, PredictedBehaviorFull) %>%
summarise(
behav_total = n()
) %>%
# calculate the total number of observations per hour
group_by(timestamp_hour) %>%
mutate(
hour_total = sum(behav_total),
percentage = behav_total / hour_total
)
产生以下输出:
# A tibble: 7 x 6
# Groups: timestamp_hour [3]
timestamp_hour Context PredictedBehaviorFull behav_total hour_total percentage
<dttm> <chr> <chr> <int> <int> <dbl>
1 2020-05-23 19:00:00 Present Bait 1971 2184 0.902
2 2020-05-23 19:00:00 Present Boat 96 2184 0.0440
3 2020-05-23 19:00:00 Present No_OP 117 2184 0.0536
4 2020-05-24 10:00:00 Absent Bait 9 1202 0.00749
5 2020-05-24 10:00:00 Absent No_OP 1193 1202 0.993
6 2020-05-24 11:00:00 Absent Bait 5 129 0.0388
7 2020-05-24 11:00:00 Absent No_OP 124 129 0.961