【问题标题】:Summarizing unique values by group over multiple columns按组在多列上汇总唯一值
【发布时间】:2021-09-03 14:29:54
【问题描述】:

我有以下问题:

我的数据集包含许多不同武器系统(级别)的国家/地区年度观察结果。我想知道每个组(国家)在数据集的时间跨度内有多少不同的系统(唯一值)。

简化后,数据集如下所示:

a <- c("Greece", "Greece", "Belgium", "Belgium", "Germany", "Germany")
b <- c(1980, 1981, 1980, 1981, 1980, 1981)
c1 <- c("Weapon1", "Weapon1", "Weapon5", "Weapon5", "Weapon3", "Weapon2")
d  <- c("Weapon2", "Weapon4", "Weapon2", "Weapon2", "Weapon1", "Weapon3")
e <- c("Weapon3", "Weapon3", "Weapon3", "Weapon4", "Weapon2", NA)

df <- data.frame(a,b,c1,d,e)

        a    b      c1       d       e
1  Greece 1980 Weapon1 Weapon2 Weapon3
2  Greece 1981 Weapon1 Weapon4 Weapon3
3 Belgium 1980 Weapon5 Weapon2 Weapon3
4 Belgium 1981 Weapon5 Weapon2 Weapon4
5 Germany 1980 Weapon3 Weapon1 Weapon2
6 Germany 1981 Weapon2 Weapon3    <NA>

所以在示例代码中,德国总共部署了 3 种不同的武器系统。我该怎么做?

提前谢谢大家!

【问题讨论】:

    标签: r unique


    【解决方案1】:
    library(tidyverse)
    
    df %>%
      pivot_longer(cols = c(c1, d, e)) %>%
      group_by(a) %>%
      filter(!is.na(value)) %>%
      distinct(value) %>%
      summarize(n=n())
    

    给出:

    # # A tibble: 3 x 2
    #   a           n
    #   <chr>   <int>
    # 1 Belgium     4
    # 2 Germany     3
    # 3 Greece      4
    

    【讨论】:

      【解决方案2】:

      base R,我们可以做

       stack(rowSums(table(rep(df$a, 3), unlist(df[3:5])) > 0))[2:1]
            ind values
      1 Belgium      4
      2 Germany      3
      3  Greece      4
      

      【讨论】:

        猜你喜欢
        • 2021-11-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-12-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-12
        相关资源
        最近更新 更多