【问题标题】:Report frequency for multiple variables in a dataframe in RReport frequency for multiple variables in a dataframe in R
【发布时间】:2022-12-28 03:41:47
【问题描述】:

I have a dataframe with data from a survey. I would like to produce a report in table format with the frequencies of each variable.

So working with the dataset mtcars, having this:

> count(mtcars, cyl)
  cyl  n
1   4 11
2   6  7
3   8 14
> count(mtcars, gear)
  gear  n
1    3 15
2    4 12
3    5  5

I would like to produce a table like this (or something similar):

variable n
cyl
4 11
6 7
8 14
gear
3 15
4 12
5 5

Any idea as to how this may be achievable?

【问题讨论】:

    标签: r dplyr r-markdown reporting flextable


    【解决方案1】:

    We can write a nested pair of functions to map count to multiple variables and row-bind the results, using a little tidy evaluation:

    library(dplyr)
    library(purrr)
    
    count_multi <- function(.data, ...) {
      count_var <- function(var, .data) {
        .data %>% 
          count(Value = factor({{ var }})) %>%  # coerce to factor to allow multiple
          mutate(                               # var types and preserve ordering
            Variable = as.character(ensym(var)),
            .before = everything()
          )
      }
      map_dfr(enquos(...), count_var, .data = .data)
    }
    
    mtcars2 <- mtcars %>% 
      mutate(
        vs = factor(vs, labels = c("V", "S")),
        am = factor(am, labels = c("manual", "automatic"))
      )
    
    mtcars2 %>% 
      count_multi(vs, am, cyl)
    

    Output:

      Variable     Value  n
    1       vs         V 18
    2       vs         S 14
    3       am    manual 19
    4       am automatic 13
    5      cyl         4 11
    6      cyl         6  7
    7      cyl         8 14
    

    I believe you can use kableExtra::pack_rows() to create subheaders for each Variable in markdown.

    【讨论】:

    • Hi, although it works fine for similar variable types, I get an error if the variable type differs (for instance factors and integers). Is there an easy fix for this? Sorry for not specifying it in the question
    • @vagvaf Sure, you can coerce the Value column to character to ensure consistent output types. I've edited my answer to do this.
    • HI @zephryl (deja vu :D) thanks for that, is there a way to sort the variable by factor levels and not alphabetically/numerically? thanks :)
    【解决方案2】:

    The below gets us the output in slightly different format. However, it does allow for subset (using column variable which OP's requirement does not.)

    library(data.table)
    
    df <- setDT(copy(mtcars))
    
    # select columns as grouping by continuous variables is not appropriate
    x <- c('cyl', 'gear')
    
    y <- lapply(x, (i) df[, .N, i])
    
    names(y) <- x
    
    y <- rbindlist(y, idcol=T, use.names=F)
    
    names(y) <- c('variable', 'class', 'count')
    
       variable class count
    1:      cyl     6     7
    2:      cyl     4    11
    3:      cyl     8    14
    4:     gear     4    12
    5:     gear     3    15
    6:     gear     5     5
    

    【讨论】:

      猜你喜欢
      • 2022-12-01
      • 2021-05-01
      • 2022-12-02
      • 2022-12-27
      • 1970-01-01
      • 2022-12-26
      • 2021-05-19
      • 2022-12-02
      • 2022-12-01
      相关资源
      最近更新 更多