【问题标题】:R Generating multiple Excel files based on aggregated dataR根据聚合数据生成多个Excel文件
【发布时间】:2018-01-24 14:12:55
【问题描述】:

我正在做一个关于 R 的小项目,我的目标是在我的数据框中为每个 Site 创建多个 Excel 文件。数据框由来自调查的 cmets 组成,其中每一行代表对给定站点的响应。共有 10 列,第一列用于 Site,其他 9 列每个主题都有 cmets。

这些 cmets 列可以分为以下几个块 -

第 1 区:总体 = 座位 + 装饰 + 接待处 + 厕所

块 2:舒适与速度 = 舒适 + 速度

第 3 部分:运营 = 效率 + 礼貌 + 响应能力

可重现的数据框如下所示

#Load libraries
 library(dplyr)
 library(xlsx)
 
#Reproducible Data Frame

df=data.frame(Site=c("Tokyo Harbor","Tokyo Harbor","Tokyo Harbor","Arlington","Arlington","Cairo Skyline","Cairo Skyline"),
       Seating=c("comfy never a problem to find","difficult","ease and quick","nobody to help","nice n comfy","old seats","nt bad"),
         Decor=c("very beautiful","i loved it!!!","nice","great","nice thanks","no response","yea nice"),
     Reception=c("always neat","I wasn't happy with the decor on this site","great!","immaculate","happy very helpful","","I wont bother again"),
       Toilets=c("well maintained","nicely managed","long queues could do better","","cleaner toilets needed!","no toilet roll in the mens loo","flush for god's sake!!!"),
       Comfort=c("very comfortable and heated","I felt like I was home","","couldn't be better","very nice and kush","not comment","fresh eyes needed"),
         Speed=c("rapid service","no delays ever got everything I needed on time","","","I have grown accustomed to the speed of service","machines","super duper quick"),
    Efficiency=c("very efficient, the servers were great","spot on","","I was quite disappointed in the efficiency","clockwork","parfait",""),
      Courtesy=c("Staff were very polite","smiling faces everywhere, loved it","very welcoming and kind","the hostess was a bit rude","trés impoli","noo",""),
Responsiveness=c("On the ball all the time","super quick whenever help was needed","","","","want more service like this",""))

#Transform all columns with empty cells to NAs

df[df==""]  <- NA 

我的目标

为每个站点创建一个 Excel 文件,其中 cmets 分组为块(如上定义)。 Excel 文件中的每张 Sheet 代表一个块,所以一共有三张。

更详细的:

第 1 步 - 对于每个站点,将 cmets 组合成三个块,然后过滤掉 cmets。

第 2 步 - 用三张纸编写 Excel 文件,每张用于给定的块

我希望将 Excel 文件保存为以下格式 -

COMMENTS_SITENAME_2017.xlsx

所以对于这个df,所需的输出将是三个 Excel 文件,因为有三个站点...

COMMENTS_Tokyo Harbor_2017.xlsx

COMMENTS_Arlington_2017.xlsx

COMMENTS_Cairo Skyline_2017.xlsx

我的尝试

我首先定义了我的块,后来我用它来过滤掉 cmets

###########################
#STEP 1: Define the blocks

#Block 1: Overall = Seating + Decor + Reception + Toilets
BlockOverall=c(names(df)[2],names(df)[3],names(df)[4],names(df)[5])

#Block 2: Comfort & Speed = Comfort + Speed
BlockComfortSpeed=c(names(df)[6],names(df)[7])

#Block 3: Operations = Efficiency + Courtesy + Responsiveness
BlockOps=c(names(df)[8],names(df)[9],names(df)[10])

然后我根据这些块对cmets进行分组,过滤掉数据

###############################################
#STEP 2: Group comments based on defined blocks

#Group Overall
Data_Overall= df %>%
select(BlockOverall)

Data_Overall = Data_Overall %>%
do(.,data.frame(Comments_Overall=unlist(Data_Overall,use.names = F))) %>%
filter(complete.cases(.))

#Group Comfort & Speed
Data_ComfortSpeed= df %>%
select(BlockComfortSpeed)

Data_ComfortSpeed = Data_ComfortSpeed %>%
do(.,data.frame(Comments_ComfortSpeed=unlist(Data_ComfortSpeed,use.names = F))) %>%
filter(complete.cases(.))

#Group Operations
Data_Operations= df %>%
select(BlockOps)

Data_Operations = Data_Operations %>%
do(.,data.frame(Comments_Operations=unlist(Data_Operations,use.names = F))) 
%>%
filter(complete.cases(.))

最后,我将数据写入 Excel

#Write each group to an individual tab in an Excel file

 library(xlsx)
 write.xlsx(Data_Overall,"Comments_Global_2017.xlsx",sheetName = 
'Overall',row.names = F) #Tab 1
 write.xlsx(Data_ComfortSpeed,"Comments_Global_2017.xlsx",sheetName = 
'Comfort_&_Speed',row.names = F,append = T) #Tab 2
 write.xlsx(Data_Operations,"Comments_Global_2017.xlsx",sheetName = 
'Operations',row.names = F,append = T) #Tab 3

在全球范围内,这很好用。我无法弄清楚如何将其转换为 for 循环,该循环遍历数据框中的所有站点并生成站点级 Excel 文件。

作为新手程序员,任何指点或建议都将受到高度重视!

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    如果您使用 tidyverse 中的 purrr,则可以避免 for 循环。

    如果您使用上面的代码并将其包装成一个基本函数,您可以使用purrr::map 对每个站点名称的函数进行迭代。

    您的设置:

    #Load libraries
    library(dplyr)
    library(xlsx)
    library(purrr)
    
    #Reproducible Data Frame
    
    df=data.frame(Site=c("Tokyo Harbor","Tokyo Harbor","Tokyo Harbor","Arlington","Arlington","Cairo Skyline","Cairo Skyline"),
                  Seating=c("comfy never a problem to find","difficult","ease and quick","nobody to help","nice n comfy","old seats","nt bad"),
                  Decor=c("very beautiful","i loved it!!!","nice","great","nice thanks","no response","yea nice"),
                  Reception=c("always neat","I wasn't happy with the decor on this site","great!","immaculate","happy very helpful","","I wont bother again"),
                  Toilets=c("well maintained","nicely managed","long queues could do better","","cleaner toilets needed!","no toilet roll in the mens loo","flush for god's sake!!!"),
                  Comfort=c("very comfortable and heated","I felt like I was home","","couldn't be better","very nice and kush","not comment","fresh eyes needed"),
                  Speed=c("rapid service","no delays ever got everything I needed on time","","","I have grown accustomed to the speed of service","machines","super duper quick"),
                  Efficiency=c("very efficient, the servers were great","spot on","","I was quite disappointed in the efficiency","clockwork","parfait",""),
                  Courtesy=c("Staff were very polite","smiling faces everywhere, loved it","very welcoming and kind","the hostess was a bit rude","trés impoli","noo",""),
                  Responsiveness=c("On the ball all the time","super quick whenever help was needed","","","","want more service like this",""))
    
    #Transform all columns with empty cells to NAs
    
    df[df==""]  <- NA 
    

    您在函数中的步骤:

    1. 获取您的数据框并按参数站点名称过滤
    2. 执行上述所有步骤
    3. 将网站 df 写入电子表格

    功能:

    export_site_data <- function(site.name){
      ###########################
      #STEP 0: filter by block site
      df <- df %>% filter(Site %in% site.name)
    
    
      ###########################
      #STEP 1: Define the blocks
    
      #Block 1: Overall = Seating + Decor + Reception + Toilets
      BlockOverall=c(names(df)[2],names(df)[3],names(df)[4],names(df)[5])
    
      #Block 2: Comfort & Speed = Comfort + Speed
      BlockComfortSpeed=c(names(df)[6],names(df)[7])
    
      #Block 3: Operations = Efficiency + Courtesy + Responsiveness
      BlockOps=c(names(df)[8],names(df)[9],names(df)[10])
    
    
    
      ###############################################
      #STEP 2: Group comments based on defined blocks
    
      #Group Overall
      Data_Overall= df %>%
        select(BlockOverall)
    
      Data_Overall = Data_Overall %>%
        do(.,data.frame(Comments_Overall=unlist(Data_Overall,use.names = F))) %>%
        filter(complete.cases(.))
    
      #Group Comfort & Speed
      Data_ComfortSpeed= df %>%
        select(BlockComfortSpeed)
    
      Data_ComfortSpeed = Data_ComfortSpeed %>%
        do(.,data.frame(Comments_ComfortSpeed=unlist(Data_ComfortSpeed,use.names = F))) %>%
        filter(complete.cases(.))
    
      #Group Operations
      Data_Operations= df %>%
        select(BlockOps)
    
      Data_Operations = Data_Operations %>%
        do(.,data.frame(Comments_Operations=unlist(Data_Operations,use.names = F))) %>%  filter(complete.cases(.))
    
      library(xlsx)
      write.xlsx(Data_Overall, paste0("Comments_",site.name,"_2017.xlsx"), sheetName = 
                   'Overall',row.names = F) #Tab 1
      write.xlsx(Data_ComfortSpeed, paste0("Comments_",site.name,"_2017.xlsx"), sheetName = 
                   'Comfort_&_Speed',row.names = F,append = T) #Tab 2
      write.xlsx(Data_Operations, paste0("Comments_",site.name,"_2017.xlsx"), sheetName = 
                   'Operations',row.names = F,append = T) #Tab 3
    }
    

    使用 Map 遍历站点名称

    site.name <- unique(df$Site)
    site.name %>% map(export_site_data )
    

    结果:

    list.files(pattern = "Comments_")
    [1] "Comments_Arlington_2017.xlsx"     "Comments_Cairo Skyline_2017.xlsx"
    [3] "Comments_Tokyo Harbor_2017.xlsx" 
    

    【讨论】:

    • 适用于我提供的可重现数据框,但是当我在另一个具有相同结构但具有更多站点的数据框上测试它时,前几个 Excel 文件生成良好,然后我收到此错误: mapply(setCellValue, cells[seq_len(nrow(cells)), colIndex[ic]], 错误:零长度输入不能与非零长度输入混合
    • 嗯,您可以查看它挂断的站点,看看您是否可以逐步完成该站点的流程。如果我不得不猜测,Data_OverallData_ComfortSpeedData_Operations 没有行会导致其中一个函数在循环时发送错误。
    • 我想我明白了,现在工作正常。我使用自定义 write.xlsx 函数调整了 write.xlsx 函数,该函数将包含空行的工作表考虑在内。感谢您的帮助!
    猜你喜欢
    • 1970-01-01
    • 2015-03-16
    • 2019-04-24
    • 1970-01-01
    • 1970-01-01
    • 2020-03-11
    • 2018-03-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多