R中邻居的空间数据/计算指标答案

【问题标题】：spatial data / compute metrics on neighbors in RR中邻居的空间数据/计算指标
【发布时间】：2014-04-28 11:37:40
【问题描述】：

我有 (xBin, yBin, value) 形式的 2D 空间数据。例如：

DT = data.table(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))

对于每个箱，我想计算所有相邻箱的变量“值”之和。如果一个 bin 的两个索引 - x 和 y 都在当前 bin 的一个单位内，则该 bin 被视为邻居

例如对于 x=2, y=2，我要计算

valueNeighbors(x=2,y=2) = value(x=1,y=1)+value(1,2)+value(1,3)
+value(2,1)+value(2,3)
+value(3,1)+value(3,2)+value(3,3)

我的真实数据有 ~1,000^2 个 bin，我怎样才能有效地做到这一点？

【问题讨论】：

标签： r 2d data.table spatial data-manipulation

【解决方案1】：

也许有光栅

X <- matrix(1:20, 4)
r <- raster(X)
r
agg <- as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
agg

     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

对于大型数据集，哪种方法更快？

X <- matrix(1:1000000, 1000)
S <- matrix(NA, nrow(X), ncol(X))
r <- raster(X)

system.time(
as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
)
user  system elapsed 
0.39    0.08    0.47

对于 1000x1000 矩阵，我无法使用 Winsemius 提出的解决方案（Win 7 x64 8GB RAM）在合理的时间内得到结果

【讨论】：

比那个循环效率高得多。还允许应用一个很好的权重矩阵。
@IShouldBuyABoat 为什么这么高效？
你可以在这里获取C代码：r-forge.r-project.org/scm/viewvc.php/pkg/raster/src/…和r-forge.r-project.org/scm/viewvc.php/pkg/raster/src/…
我突然想到，通过添加相同矩阵的适当移动副本，应该可以实现非常快速的算法。我将在下面演示。
@IShouldBuyABoat 请告诉我您在下面做了什么以使其更快。我不明白（这里很晚，我很困）。

【解决方案2】：

所以这是使用R 中的一些空间包的可能解决方案。请注意，它不是很精致，但可以完成工作。我没有手动检查结果。我也不知道这种方法与一些提供的矩阵解决方案相比有多快。

DT<-data.frame(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))
require(sp)
coordinates(DT)<-~x+y # Create spatial object (points)
rast<-raster(extent(DT),ncol=3,nrow=3)
grid<-rasterize(DT,rast)
grid<-rasterToPolygons(grid) # Create polygons

require(spdep)
neigh<-poly2nb(grid) # Create neighbour list
weights<-nb2listw(neigh,style="B",zero.policy=TRUE) # Create weights (binary)
grid$spatial.lag<-lag.listw(weights,grid$value,zero.policy=TRUE) # Add to raster

您可以简单地使用将空间对象更改回数据框

DT2<-data.frame(grid)

请注意，ID 变量对应于初始数据中的行号。

【讨论】：

【解决方案3】：

我不认为 data.table 是正确的工具。它的行索引概念不太适合此操作（尽管我可能会吐出旧信息）：

 X <- matrix(1:20, 4)
 S <- matrix(NA, nrow(X), ncol(X))
for (x in row(X)){ 
       for (y in col(X)){ 
              S[x,y] <-  sum(X[abs( row(X) - x)<2 & abs( col(X)-y)<2 ])
                 }}
 S
#---------
     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

考虑到效率，这个算法会更快......但仍然比raster::focal慢得多

rows <- dim(X)[1]; cols<-dim(X)[2]
 for (x in row(X)){
    for (y in col(X)){ 
        S[x,y] <-  sum(X[max(1,x-1):min(rows, x+1) ,max(1,y-1):min(cols,y+1) ])
                   }  }

也许更快：

system.time(  S2 <- X+
         rbind ( cbind(X[-1,-1], 0), 0)+  #diagonal shifts of the matrix
         rbind( cbind( 0, X[-1,-1000]) , 0)+
                       rbind( 0, cbind( X[-1000, -1] , 0))+
                       rbind(0, cbind( 0,X[-1000,-1000]) )+
          rbind(  X[ -1, ], 0)+    # these create the sums on the same rows or columns
          rbind(0,  X[-1000, ])+
                        cbind( X[ , -1],0)+
                        cbind(0, X[ , -1000])  )
   user  system elapsed 
  0.563   0.065   0.630 
> identical(S,S2) # compare to the focal-method above
[1] TRUE

【讨论】：

如何将我的 data.table 转换成这样的矩阵？
@IShouldBuyABoat 我无法为 1000*1000 矩阵运行循环。 raster::focal 怎么会这么快？
focal 可能是一个用 C 编写的特殊用途函数。
@IShouldBuyABoat True。谢谢。
应该注意的是，在该循环的每次迭代中，row 和 col 函数都会创建与 X 大小相同的新矩阵。有更有效的方法也就不足为奇了。如果我不需要考虑“边缘”，我可以在没有这些功能的情况下编写它。