【问题标题】:How to convert a string to a variable and to loop through group_by?如何将字符串转换为变量并循环通过 group_by?
【发布时间】:2020-05-04 17:42:19
【问题描述】:

假设我有一个包含两列 Location 和 Product 的数据集,它显示了在每个位置销售的每种产品的数量。我为每个地点销售的每种产品的数量创建了一个列联表:

data%>% 
  group_by(Location,Product)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = product, values_from = n) 

现在,假设我有 US_Product、Japan_Product、...、Germany_Product,而不是单个 Product 列。如何在 for 循环中创建列联表? 注意:当我创建像 p<-c("Product1", "Product2",..., "Product3") 这样的产品向量并遍历这些产品时,我会收到一条错误消息,因为这些是字符串而不是变量名。

这是一个最小的例子:

Location <- c("AB","ON","MN","AB","ON")
Product1<-c("Type1","Type2","Type1","Type3","Type1")
Product2<-c("Type3","Type2","Type3","Type3","Type2")
Product3<-c("Type1","Type2","Type1","Type1","Type1")
data <- tibble(Location,Product1,Product2,Product3)
data%>% 
  group_by(Location,Product1)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = Product1, values_from = n) #this works as expected

#now I want to do the same thing in a loop
prodV <- c("Product1","Product2","Product3")
for (i in c(1:3)){
  var <- prodV[i]
  data%>% 
    group_by(Location,var)%>%
    summarize(n=n()) %>%
    pivot_wider(names_from = var, values_from = n)   
}

【问题讨论】:

标签: r string loops variables contingency


【解决方案1】:

如果我们需要循环使用它,那么一个选项是map

library(dplyr)
library(purrr)
library(tidyr)
map(p, ~ 
         data%>% 
           group_by_at(vars("Location", .x)) %>%
           summarize(n=n()) %>%
           pivot_wider(names_from = .x, values_from = n))

使用可重现的示例

data(mtcars)
p <- c("cyl", "vs", "am")
map(p, ~ 
         mtcars %>% 
             group_by_at(vars('gear', .x)) %>% 
             summarise(n = n()) %>%
             pivot_wider(names_from = .x, values_from = n) ) 

或者,如果我们使用for 循环,则创建一个空的list 来存储每次迭代('out')的输出,循环'p' 值,并仅更改.x 部分map 同时将输出分配给 'out' 的每个元素 list

out <- vector('list', length(p))
names(out) <- p
for(p1 in p) {
       out[[p1]] <- data %>%
                      group_by_at(vars("Location", p1)) %>%
                      summarize(n = n()) %>%
                      pivot_wider(names_from = p1, values_from = n)
   }

【讨论】:

    【解决方案2】:

    不确定以下是否是您所追求的。以下是制作列联表的基本 R 解决方案:

    p <- c("US_Product","Japan_product","Germany_Product")
    res <- Map(function(x) table(df[c("Location",x)]),p)
    

    这样

    > res
    $US_Product
            US_Product
    Location a b c
          XX 2 0 1
          YY 1 1 2
    
    $Japan_product
            Japan_product
    Location d e f
          XX 0 2 1
          YY 3 0 1
    
    $Germany_Product
            Germany_Product
    Location g i j
          XX 0 3 0
          YY 1 1 2
    

    虚拟数据

    df <- > dput(df)
    structure(list(Location = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
    2L), .Label = c("XX", "YY"), class = "factor"), US_Product = structure(c(1L, 
    3L, 1L, 2L, 1L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
        Japan_product = structure(c(2L, 2L, 3L, 3L, 1L, 1L, 1L), .Label = c("d", 
        "e", "f"), class = "factor"), Germany_Product = structure(c(2L, 
        2L, 2L, 2L, 3L, 1L, 3L), .Label = c("g", "i", "j"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -7L))
    

    【讨论】:

      【解决方案3】:

      我能够使用 group_by_at 而不是 group_by 来处理问题。根据dplyr: whats the difference between group_by and group_by_ functions? 如果需要带引号的输入,则应使用 SE 版本的函数,而不是 NSE 版本---请参阅链接以获取详细说明。

      prodV <- c("Product1","Product2","Product3")
      for (i in c(1:3)){
        var <- prodV[i]
        a<-data%>% 
          group_by_at(vars("Location",var))%>%
          summarize(n=n()) %>%
          pivot_wider(names_from = var, values_from = n)   
        print(a)
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2014-07-31
        • 1970-01-01
        • 2015-11-10
        • 2021-12-02
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多