【问题标题】:Mutating new column based on groups基于组改变新列
【发布时间】:2020-11-10 02:21:12
【问题描述】:

有没有一种方法可以根据公共列值 (id) 将行分组在一起,然后根据每个组内的值是否高于和/ 或低于 1000?如:

  1. < 1000 = "low/low"(该组中的所有值都低于 1000)
  2. < 1000 and > 1000 = "low/high"(有些低于和高于 1000)
  3. > 1000 = "high/high"(所有值都大于 1000)

数据

#Example
  id values
1   a    200
2   a    300
3   b    100
4   b   2000
5   b   3000
6   c   4000
7   c   2000
8   c   3000
9   d   2400
10  d   2000
11  d    400

#dataframe:
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "c", 
"d", "d", "d"), values = c(200, 300, 100, 2000, 3000, 4000, 2000, 
3000, 2400, 2000, 400)), class = "data.frame", row.names = c(NA, 
-11L))

期望的输出

   id values    new.id
1   a    200   low/low
2   a    300   low/low
3   b    100  low/high
4   b   2000  low/high
5   b   3000  low/high
6   c   4000 high/high
7   c   2000 high/high
8   c   3000 high/high
9   d   2400  low/high
10  d   2000  low/high
11  d    400  low/high

dplyr 解决方案会很棒,但对其他任何人都开放!

【问题讨论】:

    标签: r grouping dplyr


    【解决方案1】:
    df %>% 
      group_by(id) %>%
      mutate(new.id = case_when(
        all(values < 1000) ~ "low/low",
        all(values > 1000) ~ "high/high",
        TRUE ~ "low/high"
      ))
    

    【讨论】:

    • 是的!非常感谢!
    【解决方案2】:

    或者,您可以使用 dplyr 中的 recode 函数。

    
    df %>% group_by(id) %>%
      mutate(
        new.id = dplyr::recode(
          sum(values > 1000) / length(values),
          `0` = "low/low",
          `1` = "high/high",
          .default = "low/high"
        )
      )
    
    

    如果您也想保留总数

    
    df %>% group_by(id) %>%
      add_tally() %>%
      mutate(new.id = dplyr::recode(
        sum(values > 1000) / n,
        `0` = "low/low",
        `1` = "high/high",
        .default = "low/high"
      ))
    
    

    【讨论】:

      猜你喜欢
      • 2018-09-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-02-18
      • 2018-07-02
      • 1970-01-01
      • 2017-05-07
      • 1970-01-01
      相关资源
      最近更新 更多