【问题标题】:R: Loop to create new data frames in RR:循环在 R 中创建新的数据帧
【发布时间】:2020-08-09 10:27:23
【问题描述】:

我正在尝试创建一个循环,该循环为 VCS 站创建多个文件,这些文件根据其站名命名。下面是为一个站点执行此操作的代码,我正在尝试将其变成一个循环,以便可以为 68 个站点执行此操作。 (即,如果我正在复制和粘贴,我将用不同的电台名称替换 P205187,例如 P205200)。我在一个名为 VCS.Sites 的数据框中有单独的电台名称(例如 P205187)。谁能指出我正确的方向?新的 R 用户在这里,我被困住了!

P205187 <- VCSrawdata[VCSrawdata$Network_ID=="P205187",]  #create a file for VCS station P205187
  #clean up after subset
  P205187$Network_ID <- factor(P205187$Network_ID)


# create annual file for VCS station P205187
P205187_annual <- group_by(P205187,Year,DESCRIPTION)
P205187_annual <- summarise(P205187_annual,Sum_Annual = sum(Value), Mean_Annual = mean(Value), CountDays=n())

# create monthly file for VCS station P205187
P205187_monthly <- group_by(P205187,Year, Month,DESCRIPTION)
P205187_monthly <- summarise(P205187_monthly,Sum_Monthly = sum(Value),Mean_monthly = mean(Value),CountDays=n())

【问题讨论】:

    标签: r loops


    【解决方案1】:

    你可以用一个 lapply 循环做得很好。像这样的:

    list_of_ids <- c("List be here")
    
    monthly <- function(id){
      P205187 <- VCSrawdata[VCSrawdata$Network_ID==id,]  #create a file for VCS station P205187
      #clean up after subset
      P205187$Network_ID <- factor(P205187$Network_ID)
      
      
      # create annual file for VCS station P205187
      P205187_annual <- group_by(P205187,Year,DESCRIPTION)
      P205187_annual <- summarise(P205187_annual,Sum_Annual = sum(Value), Mean_Annual = mean(Value), CountDays=n())
      
      # create monthly file for VCS station P205187
      P205187_monthly <- group_by(P205187,Year, Month,DESCRIPTION)
      P205187_monthly <- summarise(P205187_monthly,Sum_Monthly = sum(Value),Mean_monthly = mean(Value),CountDays=n())
      
      return(P205187_monthly)
    }
    
    monthlies <- lapply(list_of_ids, monthly)
    

    【讨论】:

      【解决方案2】:

      听起来这是为了写 csvs。我们可以使用 中的group_map 循环遍历所有站点并写入csvs。

      library(dplyr)
      
      VCSrawdata %>%
          group_by(Network_ID) %>%
          group_walk(~ {
              .x%>%
                  group_by(Year, DESCRIPTION) %>%
                  summarize(sum_annual = sum(Value),
                            mean_annual = mean(Value),
                            countDays = n())%>%
                  write.csv(file = paste0(.y$Network_ID, "_annual_csv"))
              
              .x%>%
                  group_by(Year, Month, DESCRIPTION) %>%
                  summarize(sum_month = sum(Value),
                            mean_month = mean(Value),
                            countDays = n())%>%
                  write.csv(file = paste0(.y$Network_ID, "_month_csv")) 
          }
          )
      

      注意事项:

      1. .x 指的是被Network_ID 分割的分组标题
      2. .y 指的是分组。在这种情况下,我们只有Network_ID

      【讨论】:

        【解决方案3】:

        只需在定义的方法中概括您的流程,然后将站点名称作为参数传递给循环或 apply 函数以遍历站点。使用这种方法,您可以避免许多个单独的对象泛滥全局环境,而是使用一个单个命名的许多底层元素列表来更好地序列化和组织。

        summarize_stations <- function(station_name) { 
        
           tmp_df <- VCSrawdata[VCSrawdata$Network_ID==station_name,] 
           tmp_df$Network_ID <- factor(tmp_df$Network_ID) 
        
           # create annual file for VCS station
           tmp_annual <- summarise(group_by(tmp,Year,DESCRIPTION),
                                   Sum_Annual = sum(Value), 
                                   Mean_Annual = mean(Value), 
                                   CountDays=n()) 
        
           # create monthly file for VCS station
           tmp_monthly <- summarise(group_by(tmp, Year, Month,DESCRIPTION),
                                    Sum_Annual = sum(Value), 
                                    Mean_Annual = mean(Value), 
                                    CountDays=n())
        
           # RETURN NAMED LIST OF BOTH AGGREGATIONS
           return(list(annual=tmp_annual, monthly=tmp_monthly))
        }
        
        station_list <- sapply(VCS.Sites$station_names, summarize_stations, simplify=FALSE)
        
        
        # ACCESS UNDERLYING ELEMENTS
        station_list$P205187$annual
        station_list$P205187$monthly
        ...
        

        您甚至可以使用bytapply 的面向对象的包装器)通过Network_IDVCSrawdata 子集化(假设它包括您需要的所有站点)。为此,请稍微调整函数以接收数据帧作为参数,从而允许您跳过子集行。

        summarize_stations <- function(tmp_df) { 
        
          # REMOVE SUBSET LINE
          # tmp_df <- VCSrawdata[VCSrawdata$Network_ID=="P205187",] 
          
          ...keep same code as above    
        }
        
        station_list <- by(VCSrawdata, VCSrawdata$Network_ID, FUN=summarize_stations)
        
        
        # ACCESS UNDERLYING ELEMENTS
        station_list$P205187$annual
        station_list$P205187$monthly
        ...
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2020-05-25
          • 2016-04-22
          • 2019-07-29
          • 1970-01-01
          • 1970-01-01
          • 2021-04-15
          • 2014-08-05
          • 1970-01-01
          相关资源
          最近更新 更多