【问题标题】:replace values in list-column based on named vector根据命名向量替换列表列中的值
【发布时间】:2021-10-04 11:03:48
【问题描述】:

给定以下df(或tibble)和list-column

set.seed(1)
df <- data.frame(a = sample(letters, 7),
                 b = sample(letters, 7),
                 c = c("yes", "no", "yes", "no", "yes", "no", "no"),
                 list_col = I(list(c(1, 2, 3), "hjhj", c(1, 4), "kkjkj", c(3, 4), "jkj", c(1, 2))))
df
#   a b   c list_col
# 1 y r yes  1, 2, 3
# 2 d s  no     hjhj
# 3 g a yes     1, 4
# 4 a u  no    kkjkj
# 5 b w yes     3, 4
# 6 k j  no      jkj
# 7 n n  no     1, 2

str(df)
# 'data.frame': 7 obs. of  4 variables:
#  $ a       : chr  "y" "d" "g" "a" ...
#  $ b       : chr  "r" "s" "a" "u" ...
#  $ c       : chr  "yes" "no" "yes" "no" ...
#  $ list_col:List of 7
#   ..$ : num  1 2 3
#   ..$ : chr "hjhj"
#   ..$ : num  1 4
#   ..$ : chr "kkjkj"
#   ..$ : num  3 4
#   ..$ : chr "jkj"
#   ..$ : num  1 2
#   ..- attr(*, "class")= chr "AsIs"

我想用查找表中对应值的名称替换 list_col 值,但仅限于列 c == "yes" 的行

#named lookup
c_yes_column_look_up <- c("number1" = 1,
                          "number2" = 2,
                          "number3" = 3, 
                          "number4" = 4)

所以我的最终df 看起来像:

df_final
#   a b   c     list_col
# 1 y r yes number1,....
# 2 d s  no         hjhj
# 3 g a yes number1,....
# 4 a u  no        kkjkj
# 5 b w yes number3,....
# 6 k j  no          jkj
# 7 n n  no         1, 2

str(df_final)
# 'data.frame': 7 obs. of  4 variables:
#  $ a       : chr  "y" "d" "g" "a" ...
#  $ b       : chr  "r" "s" "a" "u" ...
#  $ c       : chr  "yes" "no" "yes" "no" ...
#  $ list_col:List of 7
#   ..$ : chr  "number1" "number2" "number3"
#   ..$ : chr "hjhj"
#   ..$ : chr  "number1" "number4"
#   ..$ : chr "kkjkj"
#   ..$ : chr  "number3" "number4"
#   ..$ : chr "jkj"
#   ..$ : num  1 2
#   ..- attr(*, "class")= chr "AsIs"

我在想一些类似this 的东西,但不能完全弄清楚:

library(tidyverse)
df %>% 
  #rowwise() %>%
  mutate(list_col = case_when(c == "yes" & list_col %in% c_yes_column_look_up ~ names(list_col[list_col %in% c_yes_column_look_up]),
                                                        TRUE ~ list_col))

也欢迎其他方法,谢谢

【问题讨论】:

    标签: r list dplyr tibble


    【解决方案1】:

    命名向量应该反转。我们可以使用rowwisemap2(但map2 可能更有效)。循环遍历“list_col”和“c”的元素,创建“c”值为“yes”的条件,然后使用向量值coalesce(如果有NAs) 返回原始向量

    library(dplyr)
    library(purrr)
    df1 <- df %>% 
        mutate(list_col = map2(list_col, c, ~ if(.y %in% 'yes')
              unname(coalesce(setNames(names(c_yes_column_look_up), 
                  c_yes_column_look_up)[.x], as.character(.x))) else .x))
    

    -输出

    > str(df1)
    'data.frame':   7 obs. of  4 variables:
     $ a       : chr  "y" "d" "g" "a" ...
     $ b       : chr  "r" "s" "a" "u" ...
     $ c       : chr  "yes" "no" "yes" "no" ...
     $ list_col:List of 7
      ..$ : chr  "number1" "number2" "number3"
      ..$ : chr "hjhj"
      ..$ : chr  "number1" "number4"
      ..$ : chr "kkjkj"
      ..$ : chr  "number3" "number4"
      ..$ : chr "jkj"
      ..$ : num  1 2
    

    【讨论】:

    • 一个后续问题,如果c 中有NA,您将如何处理它,例如df &lt;- data.frame(a = sample(letters, 8), b = sample(letters, 8), c = c("yes", "no", "yes", "no", "yes", "no", "no", NA), list_col = I(list(c(1, 2, 3), "hjhj", c(1, 4), "kkjkj", c(3, 4), "jkj", c(1, 2), "hjhjkh")))
    • @user63230 如果您打算返回 TRUE/FALSE,请使用 %In% 而不是 ==,后者返回 NA,其中 NA 存在更新
    • 对不起,有一个错字,我的意思是%in%而不是%In%
    猜你喜欢
    • 2021-05-14
    • 1970-01-01
    • 1970-01-01
    • 2022-10-15
    • 2019-07-15
    • 1970-01-01
    • 2022-09-30
    • 2021-10-19
    • 2015-08-15
    相关资源
    最近更新 更多