【问题标题】:Counting the number of different variables per group over multiple columns? [duplicate]计算多列中每组不同变量的数量? [复制]
【发布时间】:2020-11-07 02:32:49
【问题描述】:

我有一个数据框,我想计算每组不同观察值的数量,而不是计算 NA 值。

以下是数据示例:

ID <-c("A", "A", "B", "B", "B", "C")
Act1 <- c("Football", "Swim", "Football", 'Basketball', "Swim", "Tennis")
Act2 <- c("Swim", "Football", "Tennis", 'Swim', "Football", "Swim")
Act3 <- c("NA", "Tennis", "NA", 'Football', "Tennis", "NA")
df <- data.frame(ID,Act1, Act2, Act3)

df

   ID       Act1     Act2     Act3
1  A   Football     Swim       NA
2  A       Swim Football   Tennis
3  B   Football   Tennis       NA
4  B Basketball     Swim Football
5  B       Swim Football   Tennis
6  C     Tennis     Swim       NA 

正确答案应该是这样的……

  ID  n
1  A  3
2  B  4
3  C  2

因为 A 有三种不同的活动(例如足球、游泳、网球),B 有四种(例如足球、游泳、网球、篮球),而 C 有两种(例如网球和游泳)

我该怎么做?

【问题讨论】:

    标签: r dataframe dplyr tidyr summarize


    【解决方案1】:

    假设空值实际上是NA 值而不是字符串"NA",您可以使用包dplyrtidyr 来实现您的预​​期输出

    library(dplyr)
    library(tidyr)
    
    df %>% 
      pivot_longer(-ID) %>% 
      filter(!is.na(value)) %>%   # if you have strings "NA" use   filter(value != "NA")   
      group_by(ID) %>%
      summarise(n = n_distinct(value))
    
    # A tibble: 3 x 2
    #   ID        n
    #   <chr> <int>
    # 1 A         3
    # 2 B         4
    # 3 C         2
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-09
    • 2013-12-30
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多