【问题标题】:Counting the number of unique items in *dplyr*计算 *dplyr* 中唯一项目的数量
【发布时间】:2023-03-12 16:55:01
【问题描述】:

我在starwars 数据集中有两个在数据集中重复的变量,比如性别和性别。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data(starwars)

A <- starwars %>% select(gender, sex) %>% arrange(gender, sex)
A %>% group_by(gender, sex) %>% count()
# A tibble: 6 x 3
# Groups:   gender, sex [6]
  gender    sex                n
  <chr>     <chr>          <int>
1 feminine  female            16
2 feminine  none               1
3 masculine hermaphroditic     1
4 masculine male              60
5 masculine none               5
6 NA        NA                 4
   A <- starwars %>% select(gender, sex) %>% arrange(gender, sex); print(A)
#> # A tibble: 87 x 2
#>    gender   sex   
#>    <chr>    <chr> 
#>  1 feminine female
#>  2 feminine female
#>  3 feminine female
#>  4 feminine female
#>  5 feminine female
#>  6 feminine female
#>  7 feminine female
#>  8 feminine female
#>  9 feminine female
#> 10 feminine female
#> # ... with 77 more rows

在上表中,我想计算每个性别的性别数量。我想要所有“女性-女性”对的计数为 1,所有“女性-无”对的计数为 2; 1 代表所有男性-雌雄同体,2 代表男性-男性,3 代表男性-无,1 代表 NA - NA 对。

以下不是解决方案,也不是我想要的。

A %>% 
group_by(gender, sex) %>% 
mutate(n_dupe = seq(n()))
# Groups:   gender, sex [6]
   gender   sex    n_dupe
   <chr>    <chr>   <int>
 1 feminine female      1
 2 feminine female      2
 3 feminine female      3
 4 feminine female      4
 5 feminine female      5
 6 feminine female      6
 7 feminine female      7
 8 feminine female      8
 9 feminine female      9
10 feminine female     10

> A %>% 
group_by(gender, sex) %>% 
mutate(n_dupe = seq(n())) %>% 
summarize(min(n_dupe), max(n_dupe))

`summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
# A tibble: 6 x 4
# Groups:   gender [3]
  gender    sex            `min(n_dupe)` `max(n_dupe)`
  <chr>     <chr>                  <int>         <int>
1 feminine  female                     1            16
2 feminine  none                       1             1
3 masculine hermaphroditic             1             1
4 masculine male                       1            60
5 masculine none                       1             5
6 NA        NA                         1             4

更新

相反,我想要数据:

   gender    sex          count 
   <chr>     <chr>         
 1 feminine  female        1
 2 feminine  female        1
 3 feminine  female        1
 4 feminine  female        1
 5 feminine  female        1
 6 feminine  female        1
 7 feminine  female        1
 8 feminine  female        1
 9 feminine  female        1
10 feminine  female        1
11 feminine  female        1
12 feminine  female        1
13 feminine  female        1
14 feminine  female        1
15 feminine  female        1
16 feminine  female        1
17 feminine  none          2
18 masculine hermaphroditic 1
19 masculine male          2 
20 masculine male          2
...            ...  
76 masculine male          2        
77 masculine male          2
78 masculine male          2
79 masculine none          3
80 masculine none          3
81 masculine none          3
82 masculine none          3
83 masculine none          3
84 NA        NA            1
85 NA        NA            1
86 NA        NA            1
87 NA        NA            1

数据摘要的样子

# Groups:   gender [3]
  gender    sex            `min(count)`      `max(count)`
  <chr>     <chr>                  <int>         <int>
1 feminine  female                     1             1
2 feminine  none                       2             2
3 masculine hermaphroditic             1             1
4 masculine male                       2             2
5 masculine none                       3             3
6 NA        NA      `                  1             1

由 reprex 包 (v1.0.0) 于 2021-06-02 部分创建

【问题讨论】:

标签: r dplyr


【解决方案1】:
A %>% 
  count(gender, sex) %>%  # or distinct(gender, sex)
  group_by(gender) %>%
  mutate(sex_num = row_number()) %>%
  ungroup()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-09-30
    • 2015-09-28
    • 1970-01-01
    • 2023-03-21
    • 1970-01-01
    相关资源
    最近更新 更多