mutate(percentage = n / sum(n)) - 没有正确计算百分比答案

【问题标题】：mutate(percentage = n / sum(n)) - not correctly calculating percentagemutate(percentage = n / sum(n)) - 没有正确计算百分比
【发布时间】：2021-08-24 03:48:44
【问题描述】：

Mutate output

我一直在编写以下代码来计算每种行为的每小时百分比（时间列 d h），但是它混淆了时间列的顺序并错误地计算了百分比。我附上了一个输出样本和一些数据。非常感谢任何帮助！

S06Behav <- S06 %>%
 group_by(Time, PredictedBehaviorFull, Context)%>%
 summarise(count= n())

S06Proportions<-S06Behav %>%
 group_by(Time, PredictedBehaviorFull, Context) %>%
 summarise(n = sum(count)) %>%
 mutate(percentage = n / sum(n))

我的数据样本是https://pastebin.com/KE0xEzk7

谢谢

【问题讨论】：

如果您创建一个小的可重现示例以及预期的输出，这将更容易提供帮助。阅读how to give a reproducible example。

标签： r percentage dplyr

【解决方案1】：

我认为百分比没有按预期计算的原因是因为根据代码，您根据 2 个相同的值确定百分比，因此比例为 1.0。

我不能完全确定您的问题，但是如果您说“混淆时间列的顺序”时，您的意思是整个 Time 列不正确，那么您可能最好使用lubridate 包来制作您的Time 列。

library(lubridate)

S06 %>% 
  
  # first we convert the Timestamp column into datetime format
  mutate(
    Timestamp = ymd_hms(Timestamp)
  ) %>% 
  
  # then, we can extract the components from the Timestamp
  mutate(
    date = date(Timestamp),
    hour = lubridate::hour(Timestamp), 
    timestamp_hour = ymd_h(str_c(date, ' ', hour))
  ) %>%

  {. ->> S06_a} # this saves the data as 'S06_a' to use next

那么，如果我理解正确的话，您想确定每小时每种行为类型的观察百分比。

S06_a %>% 
  
  # then, work out the total number of observations per hour, context and behaviour
  group_by(timestamp_hour, Context, PredictedBehaviorFull) %>% 
  summarise(
    behav_total = n()
  ) %>% 
  
  # calculate the total number of observations per hour
  group_by(timestamp_hour) %>% 
  mutate(
    hour_total = sum(behav_total), 
    percentage = behav_total / hour_total
  )

产生以下输出：

# A tibble: 7 x 6
# Groups:   timestamp_hour [3]
  timestamp_hour      Context PredictedBehaviorFull behav_total hour_total percentage
  <dttm>              <chr>   <chr>                       <int>      <int>      <dbl>
1 2020-05-23 19:00:00 Present Bait                         1971       2184    0.902  
2 2020-05-23 19:00:00 Present Boat                           96       2184    0.0440 
3 2020-05-23 19:00:00 Present No_OP                         117       2184    0.0536 
4 2020-05-24 10:00:00 Absent  Bait                            9       1202    0.00749
5 2020-05-24 10:00:00 Absent  No_OP                        1193       1202    0.993  
6 2020-05-24 11:00:00 Absent  Bait                            5        129    0.0388 
7 2020-05-24 11:00:00 Absent  No_OP                         124        129    0.961

【讨论】：

谢谢你的工作真的很好！