如何在创建栅格的 for 循环中添加平均栅格？ R答案

【问题标题】：how to add average rasters within for-loop that creates the rasters? R如何在创建栅格的 for 循环中添加平均栅格？ R
【发布时间】：2016-10-03 20:36:16
【问题描述】：

我有几个包含 700 多个二进制编码栅格的目录，我将每个目录的输出栅格取平均值。但是，我目前在 for 循环中逐个创建栅格，然后将新创建的栅格加载回 R 以获取总和以获得每月降雨总量。

但是，由于我不需要单个栅格，只需要平均栅格，我有一种预感，我可以在 1 个循环中完成这一切，而不是保存栅格，而只保存输出平均栅格，但我要来了简述如何在 R 中编程。

setwd("~/Desktop/CMORPH/Levant-Clip/200001")

dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)

for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
  y<-matrix((data=data), ncol=1649, nrow=4948)
  r <- raster(y)
  e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 
  tr <- t(r) #transpose 
  re <- setExtent(tr,extent(e)) ### set the extent to the raster
  ry <- flip(re, direction = 'y')
  projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
  C_Lev <- crop(ry, Levant) ### Clip to Levant
  M_C_Lev<-mask(C_Lev, Levant)
  writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
}
# 
raspath <- list.files ('~/Desktop/CMORPH/Levant-Clip/200001',pattern="*.tif",     full.names=T, recursive=T)
rasstk <- stack(raspath)
sum200001<-sum(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

目前，执行此代码大约需要 75 分钟，而且我还有大约 120 个目录要走，并且正在寻找更快的解决方案。

感谢所有和任何 cmets 和输入。最好的，埃文

【问题讨论】：

在我看来，编写栅格似乎不是必需的，因为堆栈也接受输入中的栅格对象列表。因此，您可以将 writeraster 替换为按顺序将 M_C_lev 分配给列表的元素。但是，这可能会占用大量内存。此外，如果您确定所有 rastet 的范围相同，请考虑在堆栈中使用“快速”选项。
另外，我似乎记得'brick'可能比'stack'更快

标签： r for-loop raster r-raster

【解决方案1】：

详细说明我之前的评论，您可以尝试：

setwd("~/Desktop/CMORPH/Levant-Clip/200001")

dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
raster_list = list()
for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
  y<-matrix((data=data), ncol=1649, nrow=4948)
  r <- raster(y)
  if (i == 1) {
    e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 

  }
  tr <- t(r) #transpose 
  re <- setExtent(tr,extent(e)) ### set the extent to the raster
  ry <- flip(re, direction = 'y')
  projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
  C_Lev <- crop(ry, Levant) ### Clip to Levant
  M_C_Lev<-mask(C_Lev, Levant)
  raster_list[[i]] = M_C_Lev
}
# 

rasstk <- stack(raster_list, quick = TRUE) # OR rasstk <- brick(raster_list, quick = TRUE)
avg200001<-mean(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

使用stack 中的“快速”选项肯定会加快速度，尤其是在您有很多栅格的情况下。

另一种可能性是首先计算平均值，然后执行“空间处理”。例如：

for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs

  if (i == 1) {
   totdata  <-  data 
   num_nonNA <- as.numeric(!is.na(data))
  } else {
totdata = rowSums(cbind(totdata,data), na.rm = TRUE)
# We have to count the number of "valid" entries so that the average is correct !
num_nonNA = rowSums(cbind(num_nonNA,as.numeric(!is.na(data))),na.rm = TRUE)
  }
}

avg_data = totdata/num_nonNA # Compute the average

# Now do the "spatial" processing

y<-matrix(avg_data, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 
tr <- t(r) #transpose 
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(avg_data, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

这可能更快或更慢，具体取决于您裁剪原始数据的“多少”。

HTH，

洛伦佐

【讨论】：

Lorenzo，谢谢你的回答，代码还在运行中，我会回复时间结果。不幸的是，翻转和转置是必要的，也是 CMOPRH 沉淀数据以二进制编码的方式的一部分。我将尝试在循环之后翻转和转置，尽管一旦它被堆叠。
很高兴为您提供帮助。只转置和翻转平均图像似乎是个好主意，但我不知道它是否能很好地处理crop 和mask。我还发现，如果您遇到内存问题，您可以尝试做不同的事情：在 for 循环内累积值，然后在循环外计算平均值。这样，您的内存中将只有两个“项目”：当前累积和“当前”单个日期文件...
在回复中添加了一种可能更快的方法。看看吧。
所以，经过多次比较，我并没有及时提高很多。 Lorenzo，您的第二种方法在一小时左右后抛出了“错误：无法分配大小为 n Mb 的向量”。我现在要回去使用“foreach”和“doParellel”库来加快速度。
嗨偶。两件快速的事情：1）我更新了第二种方法的代码，因为我看到使用“rowSums”而不是 apply 大大减少了时间。我尝试了编写 700 次迭代循环的代码，每次创建一个 4948*1649 数组，完成循环只用了不到 5 分钟。也许你想试试看。

【解决方案2】：

我正在添加另一个答案以稍微澄清和简化事情，也与聊天中的 cmets 相关。下面的代码应该满足您的要求：即循环文件、读取“数据”、计算所有文件的总和并将其转换为具有指定尺寸的栅格。

请注意，出于测试目的，我将文件名上的循环替换为简单的 1 到 720 循环，并通过创建与您的数组长度相同的数组来读取文件，并填充从 1 到4 和一些不适用！

totdata <- array(dim = 4948*1649)  # Define Dummy array
for (i in 1:720) {
  message("Working on file: ", i)
  data <- array(rep(c(1,2,3,4),4948*1649/4), dim = 4948*1649) # Create a "fake" 4948*1649 array  each time to simulate data reading
  data[1:1000] <- -999   # Set some values to NA
  data[data == -999] <- NA #convert missing data from -999

  totdata <- rowSums(cbind(totdata, data), na.rm = T)   # Let's sum the current array with the cumulative sum so far
}

# Now reshape to matrix and convertt to raster, etc.
y  <- matrix(totdata, ncol=1649, nrow=4948)
r  <- raster(y)
e  <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,e) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"

这会生成一个“正确”的栅格：

> ry
class       : RasterLayer 
dimensions  : 1649, 4948, 8159252  (nrow, ncol, ncell)
resolution  : 0.07275667, 0.1052902  (x, y)
extent      : -180, 180, -90, 83.6236  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 2880  (min, max)

包含不同数组的总和：您会注意到最大值为 720 * 4 = 2880（仅警告：如果您的单元格始终位于 NA，您将得到 0 而不是 NA)

在我的笔记本电脑上，这大约需要 5 分钟！

在实践中：

为避免内存问题，我没有在内存中读取所有数据。你的每个阵列都或多或少 64MB，所以我不能全部加载它们然后做总和（除非我有 50 GB 的 RAM 可以扔掉 - 甚至在这种情况下会很慢）。我改为使用联想通过计算每个“累积”总和来求和的性质循环。通过这种方式，您只需使用两个 800 万个数组一次：您从文件“i”中读取的那个，以及包含当前总和。
为了避免不必要的计算，我直接将我从读取二进制文件中得到的一维数组。你不需要重塑以矩阵化循环中的数组，因为您可以这样做在最终的“求和”数组上，然后您可以将其转换为矩阵形式

我希望这对你有用，并且我没有遗漏一些明显的东西！

据我所知，如果使用这种方法仍然很慢，那么您在其他地方就会遇到问题（例如在数据读取方面：在 720 个文件上，读取每个文件花费 3 秒意味着大约需要 35 分钟的处理时间）。

HTH，

洛伦佐

【讨论】：