创建新变量作为另一个变量的第一个值，按第三个变量排序答案

【问题标题】：Create new variable as the FIRST value of another variable, sorted by a third variable创建新变量作为另一个变量的第一个值，按第三个变量排序
【发布时间】：2019-03-10 11:04:59
【问题描述】：

我有一个类似下面的数据库：

score_df <- tibble(country = c("US", "US", "US", "US", "Mex", "Mex"),
               year = c(2001, 2000, 1997, 2003, 1998, 2006),
               perc = c(5, 6,8, 8, NA, 10),
               score = c(NA, 400, NA, 423, 12, 18))

我想创建一个新变量year_1_score，它代表第一（非NA）年份的分数。换言之，year_1_score 应为每一行填写并满足以下条件：
-按国家/地区分组
- 按年份排列
- 对于每个国家/地区，获取第一个不是 NA 的 score
- 为该国家/地区的所有单元格插入此值

我希望最终的 df 看起来像这样：

score_df <- tibble(country = c("US", "US", "US", "US", "Mex", "Mex"),
               year = c(2001, 2000, 1997, 2003, 1998, 2006),
               perc = c(5, 6,8, 8, NA, 10),
               score = c(NA, 400, NA, 423, 12, 18),
               year_1_score = c(400, 400, 400, 400, 12, 12))

我尝试了以下两次尝试，但无济于事。

尝试 #1：

score_df <- score_df %>% 
group_by(country) %>% 
arrange(year) %>% 
mutate(yr_1_score = ifelse(year==min(year) & !is.na(score), score, NA)) %>% 
ungroup()

尝试 #2：

score_df <- score_df %>% 
group_by(country) %>% 
arrange(year) %>% 
filter(!is.na(score)) %>% 
slice(1) %>% 
mutate(yr_1_score = score) %>% 
ungroup()

任何人都可以破解问题？强烈推荐使用 dplyr 的解决方案，但我们将不胜感激！

提前致谢！

【问题讨论】：

标签： r dplyr

【解决方案1】：

我们可以先通过yeararrange 数据帧，然后group_bycountry 并为每个组选择第一个非 NA 值。

library(dplyr)

score_df %>%
  arrange(year) %>%
  group_by(country) %>%
  mutate(year_1_score = score[which.max(!is.na(score))]) %>%
  arrange(country)


#  country  year  perc score year_1_score
#  <chr>   <dbl> <dbl> <dbl>        <dbl>
#1 Mex      1998    NA    12           12
#2 Mex      2006    10    18           12
#3 US       1997     8    NA          400
#4 US       2000     6   400          400
#5 US       2001     5    NA          400
#6 US       2003     8   423          400

【讨论】：

这绝对是完美的！非常感谢，罗纳克！您介意解释为什么score[which.max()]... 参数从第一个year 中获取score，而不是该国家/地区的最大分数吗？非常感谢您的超快速、超有帮助的回复。
@wscampbell 使用!is.na(score) 我们得到TRUE/FALSE 值，其中TRUE 表示该值不是NA。 which.max 在TRUE/FALSE 值的向量中找到第一个最大值，其中TRUE 被视为1，FALSE 为0，因此which.max 获取向量中第一个TRUE 值的索引，然后我们得到使用该索引对应的score。