【问题标题】:How to count and calculate percentages for two columns in an R data.frame?如何计算和计算 R data.frame 中两列的百分比?
【发布时间】:2022-01-14 20:40:32
【问题描述】:

在 R 中,我有一个这样的 data.frame:

df1 <- data.frame(
  grade = rep(LETTERS[1:5], 4),
  sex = c(rep("male", 5), rep("female", 5), rep("male", 4), rep("female", 6)),
  class = c(rep(1, 10), rep(2, 10))
)

df1

   grade    sex class
1      A   male     1
2      B   male     1
3      C   male     1
4      D   male     1
5      E   male     1
6      A female     1
7      B female     1
8      C female     1
9      D female     1
10     E female     1
11     A   male     2
12     B   male     2
13     C   male     2
14     D   male     2
15     E female     2
16     A female     2
17     B female     2
18     C female     2
19     D female     2
20     E female     2

我想计算每个班级的性别百分比并制作另一个 data.frame,例如:

Class Male_percent Female_percentage 
1     50%          50% 
2     40%          60%

有人可以教我怎么做吗? 这个问题以前可能有人问过,但我不知道这个问题的关键字是什么。如果我再次问同样的问题,我很抱歉。

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    试试看门人包中的tabyl

    library(janitor)
    df1 %>%
      tabyl(class, sex) %>%
      adorn_percentages()
    
     class female male
         1    0.5  0.5
         2    0.6  0.4
    

    如果您想格式化为百分比,请添加adorn_pct_formatting():

    df1 %>%
      tabyl(class, sex) %>%
      adorn_percentages() %>%
      adorn_pct_formatting()
    
     class female  male
         1  50.0% 50.0%
         2  60.0% 40.0%
    

    免责声明:我是这些函数的作者。

    【讨论】:

      【解决方案2】:

      你可以试试

       prop.table(table(df1[3:2]),1)*100
       #    sex
       #class female male
       #  1     50   50
       #  2     60   40
      

      或者data.table

       library(data.table)
       setDT(df1)[, .N, by = .(class, sex)
                ][, .(Male_percent = paste0(100 * N[sex == 'male'] / sum(N), '%'), 
                    Female_percent = paste0(100 * N[sex == 'female'] / sum(N), '%')), 
                 by = class] 
       #   class Male_percent   Female_percent
       #1:     1          50%              50%
       #2:     2          40%              60%
      

      或使用dplyr

       library(dplyr)
       df1 %>%
           group_by(class) %>% 
           summarise(Male_Percent= sprintf('%d%%', 100*sum(sex=='male')/n()), 
                   Female_Percent = sprintf('%d%%', 100*sum(sex=='female')/n()))
       #    class Male_Percent Female_Percent
       #1     1          50%            50%
       #2     2          40%            60%
      

      或者

        library(sqldf)
        res1 <- sqldf('select class, 
                  100*sum(sex=="male")/count(sex) as m, 
                  100*sum(sex=="female")/count(sex) as f,
                  "%" as p
                   from df1
                   group by class')
         sqldf("select class,
                 m||p as Male_Percent, 
                 f||p as Female_Percent 
                 from res1")
         #  class Male_Percent Female_Percent
         #1     1          50%            50%
         #2     2          40%            60%
      

      更新

      基于@G.Grothendieck 的 cmets,sqldf cmets 可以简化为

         sqldf("select class,
              (100 * avg(sex = 'male')) || '%' as Male_Percent,
              (100 * avg(sex = 'female')) || '%' as Female_Percent
              from df1 group
               by class")
         #     class Male_Percent Female_Percent
         #1     1        50.0%          50.0%
         #2     2        40.0%          60.0%
      

      【讨论】:

      • @G.Grothendieck 谢谢,好多了。
      【解决方案3】:

      使用data.table 包,您可以执行以下操作

      setDT(df)[ , .(
                      Male_Percent = paste0(( nrow(.SD[sex == "male"]) / .N ) * 100 , "%")   , 
                      Female_Percent = paste0(( nrow(.SD[sex == "female"]) / .N ) * 100 , "%")
                    )   , 
                 by = class
               ]
      

      结果

      #     class      Male_Percent  Female_Percent
      # 1:     1          50%            50%
      # 2:     2          40%            60%
      

      另一个dplyr 解决方案将是

      df %>%
        group_by(sex , class) %>%
        summarise(n = n()) %>%
        group_by(class) %>%
        summarise(
          Male_Percent = paste0((n[sex == "male"] / sum(n)) * 100 , "%")    , 
          Female_Percent = paste0((n[sex == "female"] / sum(n) * 100) , "%")   
        )
      
      #  class   Male_Percent     Female_Percent
      #   1          50%            50%
      #   2          40%            60%
      

      【讨论】:

      • 我想你打算评论另一个答案:)
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-01-02
      • 2023-02-01
      • 2017-03-28
      • 2016-07-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多