【问题标题】:aggregating output from multiple input files in R在 R 中聚合来自多个输入文件的输出
【发布时间】:2012-11-19 22:25:59
【问题描述】:

现在我有下面的 R 代码。它读取如下所示的数据:

track_id    day hour    month   year    rate    gate_id pres_inter  vmax_inter
9   10  0   7   1   9.6451E-06  2   97809   23.545
9   10  0   7   1   9.6451E-06  17  100170  13.843
10  3   6   7   1   9.6451E-06  2   96662   31.568
13  22  12  8   1   9.6451E-06  1   94449   48.466
13  22  12  8   1   9.6451E-06  17  96749   30.55
16  13  0   8   1   9.6451E-06  4   98702   19.205
16  13  0   8   1   9.6451E-06  16  98585   18.143
19  27  6   9   1   9.6451E-06  9   98838   20.053
19  27  6   9   1   9.6451E-06  17  99221   17.677
30  13  12  6   2   9.6451E-06  2   97876   27.687
30  13  12  6   2   9.6451E-06  16  99842   18.163
32  20  18  6   2   9.6451E-06  1   99307   17.527


##################################################################
# Input / Output variables
##################################################################
for (N in (59:96)){
  if (N < 10){
#     TrackID <- "000$N"
     TrackID <- paste("000",N, sep="")
  }
  else{
#     TrackID <- "00$N"
     TrackID <- paste("00",N, sep="")
  }
  print(TrackID)

# For 2010_08_24 trackset
#  fname_in <- paste('input/2010_08_24/intersections_track_calibrated_jma_from1951_',TrackID,'.csv', sep="")
#  fname_out <- paste('output/2010_08_24/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
# For 2012_05_01 trackset
  fname_in <- paste('input/2012_05_01/intersections_track_param_',TrackID,'.csv', sep="")
  fname_out <- paste('output/2012_05_01/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
  fname_out2 <- paste('output/2012_05_01/GateID_',TrackID,'.csv', sep="")

#######################################################################
# we read the gate crossing track date
  cat('reading the crosstat output file', fname_in, '\n')
  header <- read.table(fname_in, nrows=1)
  track <- read.table(fname_in, sep=',', skip=1)
  colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")

#  track_id=track[,1]
#  pres_inter=track[,15]

# Function to select maximum surge by stormID 
  ByTrack <- ddply(track, "ID", function(x) x[which.max(x$vmax_inter),])
  ByGate <- count(track, vars="gate_id")

# Write the output file with a single record per storm                     
  cat('Writing the full output file', fname_out, '\n')
  write.table(ByTrack, fname_out, col.names=T, row.names=F, sep = ',')

# Write the output file with a single record per storm                     
   cat('Writing the full output file', fname_out2, '\n')
   write.table(ByGate, fname_out2, col.names=T, row.names=F, sep = ',')
}

我最后一段代码的输出是一个按 GateID 分组的文件,并输出出现频率。它看起来像这样:

gate_id freq
1   935
2   2096
3   1363
4   963
5   167
6   17
7   43
8   62
9   208
10  267
11  64
12  162
13  178
14  632
15  807
16  2003
17  838
18  293

问题是我为 96 个不同的输入文件输出了一个看起来像这样的文件。我不想输出 96 个单独的文件,而是想计算每个输入文件的这些聚合,然后将所有 96 个输入的频率相加并打印出一个 SINGLE 输出文件。有人可以帮忙吗?

谢谢, 克

【问题讨论】:

    标签: r count aggregate


    【解决方案1】:

    您将需要执行以下功能。这将抓取一个目录中的所有 .csv 文件,因此该目录必须只包含您要分析的文件。

    myFun <- function(out.file = "mydata") {
    files <- list.files(pattern = "\\.(csv|CSV)$")
    # Use this next line if you are going use the file name as a variable/output etc
    files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4)
    
    for (i in 1:length(files)) {
        temp <- read.csv(files[i], header = FALSE)
        # YOUR CODE HERE
        # Use the code you have already written but operate on files[i] or temp
        # Save the important stuff into one data frame that grows
        # Think carefully ahead of time what structure makes the  most sense
        }
    
    datafile <- paste(out.file, ".csv", sep = "")
    write.csv(yourDataFrame, file = datafile)
    }
    

    【讨论】:

    • 谢谢你 - 我明天会继续努力!我很珍惜时间。
    猜你喜欢
    • 2020-07-10
    • 2023-03-31
    • 2021-04-20
    • 2019-02-03
    • 1970-01-01
    • 2014-10-02
    • 2012-08-07
    • 2017-10-11
    • 1970-01-01
    相关资源
    最近更新 更多