按州分组数据并使用 dplyr 获取百分比？答案

【问题标题】：Group data by state and get percent with dplyr?按州分组数据并使用 dplyr 获取百分比？
【发布时间】：2022-12-12 07:02:09
【问题描述】：

如何在此数据中按州对客户进行分组并使用 dplyr 获得以下客户百分比？： customer data

colsolarestates <- ColSolare %>% group_by(state) %>% summarise(state = n()) %>% mutate(percent = (state) / sum(state)*100)

【问题讨论】：

低于什么百分比的客户？另外，请不要张贴图片。只需将 dput(Colsolare) 的输出粘贴到问题中（或 dput(head(Colsolare,30))）

标签： dplyr

【解决方案1】：

我假设“客户”指的是customer_type 变量。您可以使用 group_by 和 .drop = FALSE 来获取每个州的 customer_type 的百分比，以捕获零。您还可以使用 label_percent 在每个值的末尾放置一个 %。

library(tidyverse)
library(scales)
set.seed(123)
state <- state.abb %>% as.data.frame() %>% rename(state = 1)
ColSolare <- state %>% 
  as.data.frame() %>% 
  rename(state = 1) %>%
  rbind(state, state, state, state) %>%
  mutate(customer_type = sample(
    x = c("bar", "restaurant", "cafe"),
    size = 250,
    replace = TRUE,
    prob = c(.2, .6, .2)
  ))

colsolarestates <- ColSolare %>% 
  mutate(state = factor(state)) %>% mutate(customer_type = factor(customer_type)) %>%
  group_by(state, customer_type, .drop = FALSE) %>% 
  summarise(n = n()) %>% 
  mutate(percent = (n) / sum(n)*100) %>% 
  ungroup() %>%
  mutate(percentlabel = scales::label_percent(scale = 1)(percent))
#> `summarise()` has grouped output by 'state'. You can override using the
#> `.groups` argument.

head(colsolarestates)
#> # A tibble: 6 × 5
#>   state customer_type     n percent percentlabel
#>   <fct> <fct>         <int>   <dbl> <chr>       
#> 1 AK    bar               1      20 20%         
#> 2 AK    cafe              1      20 20%         
#> 3 AK    restaurant        3      60 60%         
#> 4 AL    bar               1      20 20%         
#> 5 AL    cafe              0       0 0%          
#> 6 AL    restaurant        4      80 80%

【讨论】：