如何将字符串转换为变量并循环通过 group_by？答案

【问题标题】：How to convert a string to a variable and to loop through group_by?如何将字符串转换为变量并循环通过 group_by？
【发布时间】：2020-05-04 17:42:19
【问题描述】：

假设我有一个包含两列 Location 和 Product 的数据集，它显示了在每个位置销售的每种产品的数量。我为每个地点销售的每种产品的数量创建了一个列联表：

data%>% 
  group_by(Location,Product)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = product, values_from = n)

现在，假设我有 US_Product、Japan_Product、...、Germany_Product，而不是单个 Product 列。如何在 for 循环中创建列联表？注意：当我创建像 p<-c("Product1", "Product2",..., "Product3") 这样的产品向量并遍历这些产品时，我会收到一条错误消息，因为这些是字符串而不是变量名。

这是一个最小的例子：

Location <- c("AB","ON","MN","AB","ON")
Product1<-c("Type1","Type2","Type1","Type3","Type1")
Product2<-c("Type3","Type2","Type3","Type3","Type2")
Product3<-c("Type1","Type2","Type1","Type1","Type1")
data <- tibble(Location,Product1,Product2,Product3)
data%>% 
  group_by(Location,Product1)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = Product1, values_from = n) #this works as expected

#now I want to do the same thing in a loop
prodV <- c("Product1","Product2","Product3")
for (i in c(1:3)){
  var <- prodV[i]
  data%>% 
    group_by(Location,var)%>%
    summarize(n=n()) %>%
    pivot_wider(names_from = var, values_from = n)   
}

【问题讨论】：

欢迎来到 Stack Overflow！您能否通过分享您的数据样本来重现您的问题，以便其他人可以提供帮助（请不要使用str()、head() 或屏幕截图）？您可以使用 reprex 和 datapasta 包来帮助您。另见Help me Help you & How to make a great R reproducible example?

标签： r string loops variables contingency

【解决方案1】：

如果我们需要循环使用它，那么一个选项是map

library(dplyr)
library(purrr)
library(tidyr)
map(p, ~ 
         data%>% 
           group_by_at(vars("Location", .x)) %>%
           summarize(n=n()) %>%
           pivot_wider(names_from = .x, values_from = n))

使用可重现的示例

data(mtcars)
p <- c("cyl", "vs", "am")
map(p, ~ 
         mtcars %>% 
             group_by_at(vars('gear', .x)) %>% 
             summarise(n = n()) %>%
             pivot_wider(names_from = .x, values_from = n) )

或者，如果我们使用for 循环，则创建一个空的list 来存储每次迭代（'out'）的输出，循环'p' 值，并仅更改.x 部分map 同时将输出分配给 'out' 的每个元素 list

out <- vector('list', length(p))
names(out) <- p
for(p1 in p) {
       out[[p1]] <- data %>%
                      group_by_at(vars("Location", p1)) %>%
                      summarize(n = n()) %>%
                      pivot_wider(names_from = p1, values_from = n)
   }

【讨论】：

【解决方案2】：

不确定以下是否是您所追求的。以下是制作列联表的基本 R 解决方案：

p <- c("US_Product","Japan_product","Germany_Product")
res <- Map(function(x) table(df[c("Location",x)]),p)

这样

> res
$US_Product
        US_Product
Location a b c
      XX 2 0 1
      YY 1 1 2

$Japan_product
        Japan_product
Location d e f
      XX 0 2 1
      YY 3 0 1

$Germany_Product
        Germany_Product
Location g i j
      XX 0 3 0
      YY 1 1 2

虚拟数据

df <- > dput(df)
structure(list(Location = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("XX", "YY"), class = "factor"), US_Product = structure(c(1L, 
3L, 1L, 2L, 1L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    Japan_product = structure(c(2L, 2L, 3L, 3L, 1L, 1L, 1L), .Label = c("d", 
    "e", "f"), class = "factor"), Germany_Product = structure(c(2L, 
    2L, 2L, 2L, 3L, 1L, 3L), .Label = c("g", "i", "j"), class = "factor")), class = "data.frame", row.names = c(NA, 
-7L))

【讨论】：

【解决方案3】：

我能够使用 group_by_at 而不是 group_by 来处理问题。根据dplyr: whats the difference between group_by and group_by_ functions? 如果需要带引号的输入，则应使用 SE 版本的函数，而不是 NSE 版本---请参阅链接以获取详细说明。

prodV <- c("Product1","Product2","Product3")
for (i in c(1:3)){
  var <- prodV[i]
  a<-data%>% 
    group_by_at(vars("Location",var))%>%
    summarize(n=n()) %>%
    pivot_wider(names_from = var, values_from = n)   
  print(a)
}

【讨论】：