【问题标题】:Conditionally replace values across multiple columns based on string match in a separate column根据单独列中的字符串匹配有条件地替换多列中的值
【发布时间】:2020-10-16 02:25:40
【问题描述】:

我正在尝试根据不同列中的字​​符串匹配有条件地替换多列中的值,但我希望能够使用 cross() 函数在一行代码中执行此操作,但我不断得到对我来说不太有意义的错误。我觉得这可能是一个简单的解决方案,所以如果有人能指出我正确的方向,那就太棒了!

df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
               "total" = c(34, 56, 75, 89, 21, 56),
               "group_a" = c(30, 26, 45, 60, 3, 46),
               "group_b" = c(4, 30, 30, 29, 18, 10))

# working but not concise
df %>%
  mutate(total = ifelse(str_detect(type, "Park"), NA, total),
         group_a = ifelse(str_detect(type, "Park"), NA, group_a),
         group_b = ifelse(str_detect(type, "Park"), NA, group_b))

  
# concise but not working
df %>% mutate(across(total, group_a, group_b), ifelse(str_detect(type, "Park"), NA, .))

更新

我们的解决方案适用于我的虚拟数据集,但不适用于我的真实数据,因此我将分享我的真实数据框的一个小 sn-p,其中数字已更改并隐藏了组织名称。当我对这些数据运行这行代码 (df %&gt;% mutate(across(c(Attempts, Canvasses, Completes)), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .))) 时,我收到以下错误消息:

错误:mutate() 输入 ..2 有问题。 x 输入 ..2 必须是 矢量,而不是 formula 对象。 i 输入..2~ifelse(str_detect(long_name, "park-cemetery"), NA, .).

这是产生此错误的一小部分数据:

df <- structure(list(Org = c("OrgName", "OrgName", "OrgName", "OrgName", 
"OrgName", "OrgName", "OrgName", "OrgName", "OrgName", "OrgName"
), nCode = c("M34", "R36", "R46", "X29", "M31", "K39", "Q12", 
"Q39", "X41", "K27"), Attempts = c(100, 100, 100, 100, 100, 100, 
100, 100, 100, 100), Canvasses = c(80, 80, 80, 80, 80, 80, 80, 
80, 80, 80), Completes = c(50, 50, 50, 50, 50, 50, 50, 50, 50, 
50), van_nocc_id = c(999, 999, 999, 999, 999, 999, 999, 999, 
999, 999), van_name = c("M-Upper West Side", "SI-Rosebank", "SI-Tottenville", 
"BX-park-cemetery-etc-Bronx", "M-Stuyvesant Town-Cooper Village", 
"BK-Kensington", "Q-Broad Channel", "Q-Lindenwood", "BX-Wakefield", 
"BK-East New York"), boro_short = c("M", "SI", "SI", "BX", "M", 
"BK", "Q", "Q", "BX", "BK"), long_name = c("Upper West Side", 
"Rosebank", "Tottenville", "park-cemetery-etc-Bronx", "Stuyvesant Town-Cooper Village", 
"Kensington", "Broad Channel", "Lindenwood", "Wakefield", "East New York"
)), row.names = c(NA, -10L), class = "data.frame")

最终更新

错位右括号的诅咒!感谢大家的帮助...正确的解决方案是df %&gt;% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))

【问题讨论】:

    标签: r stringr dplyr across


    【解决方案1】:

    如果您使用新引入的函数across(这是处理此任务的正确方法),您必须在across 本身内指定您要应用的函数。在这种情况下,函数 ifelse(...) 必须是一个 purrr 样式的 lambda(所以从 ~ 开始)。查看acrossdocumentation 并查找参数.cols.fns

    df %>% 
      mutate(across(c(total, group_a, group_b), ~ifelse(str_detect(type, "Park"), NA, .)))
    

    输出

    #           type total group_a group_b
    # 1         Park    NA      NA      NA
    # 2 Neighborhood    56      26      30
    # 3      Airport    75      45      30
    # 4         Park    NA      NA      NA
    # 5 Neighborhood    21       3      18
    # 6 Neighborhood    56      46      10
    

    【讨论】:

    • 感谢您的帮助!有趣的是,这适用于我的虚拟数据集,但当我将完全相同的语法应用于更大的真实数据集时,这不起作用......我不断收到此错误消息:错误:mutate() 输入问题..2。 x 输入 ..2 必须是向量,而不是 formula 对象。 i Input ..2 is `~ifelse(str_detect(long_name, "park-cemetery"), NA, .)`` 有没有想过为什么它可能会在更大的数据集上抛出这个错误?
    • 你还在用c(variable_1, variable_2, ...)传递你想要变异的变量吗?
    • 是的,完全一样。
    • 您能否使用dput 函数分享您更大数据的样本并将输出粘贴到您的问题中?另外,你能粘贴你用来得到那个错误的代码吗?
    • 啊,错位右括号的诅咒!感谢您的帮助!
    【解决方案2】:

    这里有一个 data.table 解决方案。

    require(data.table)
    df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
                   "total" = c(34, 56, 75, 89, 21, 56),
                   "group_a" = c(30, 26, 45, 60, 3, 46),
                   "group_b" = c(4, 30, 30, 29, 18, 10))
    
    setDT(df)
    df[type == "Park", c("total", "group_a", "group_b") := NA]
    

    【讨论】:

      【解决方案3】:

      更新:很快就弄清楚了!只需要将列放在向量中:

      # concise AND working!
      df %>% mutate(across(c(total, group_a, group_b)), ifelse(str_detect(type, "Park"), NA, .))
      

      我最初尝试过,但将列放在引号中...不要那样做:)

      【讨论】:

      • 实际上这个答案不起作用,因为你得到一个名为 felse(str_detect(type, "Park"), NA, .) 的列(至少在我的情况下)。请查看我上面的答案。
      猜你喜欢
      • 2018-05-30
      • 2019-02-03
      • 1970-01-01
      • 1970-01-01
      • 2012-11-06
      • 1970-01-01
      • 2017-06-03
      • 1970-01-01
      • 2019-08-04
      相关资源
      最近更新 更多