用概率有向图从一阶邻接矩阵计算二阶邻接矩阵的快速算法答案

【问题标题】：Fast algorithm for calculating second-order adacency matrix from first-order adjacency matrix with a probabilistic directed graph用概率有向图从一阶邻接矩阵计算二阶邻接矩阵的快速算法
【发布时间】：2017-06-21 09:09:10
【问题描述】：

我正在使用如下所示的邻接矩阵：

N <- 5
A <- matrix(round(runif(N^2),1),N)
diag(A) <- 0

1> A
     [,1] [,2] [,3] [,4] [,5]
[1,]  0.0  0.1  0.2  0.6  0.9
[2,]  0.8  0.0  0.4  0.7  0.5
[3,]  0.6  0.8  0.0  0.8  0.6
[4,]  0.8  0.1  0.1  0.0  0.3
[5,]  0.2  0.9  0.7  0.9  0.0

概率和定向。

这是一种计算i 通过至少一个其他节点链接到j 的概率的慢速方法：

library(foreach)
`%ni%` <- Negate(`%in%`) #opposite of `in`
union.pr <- function(x){#Function to calculate the union of many probabilities
  if (length(x) == 1){return(x)}
  pr <- sum(x[1:2]) - prod(x[1:2])
  i <- 3
  while(i <= length(x)){
    pr <- sum(pr,x[i]) - prod(pr,x[i])
    i <- 1+i
  }
  pr
}

second_order_adjacency <- function(A, i, j){#function to calculate probability that i is linked to j through some other node
  pr <- foreach(k = (1:nrow(A))[1:nrow(A) %ni% c(i,j)], .combine = c) %do% {
    A[i,k]*A[k,j]
  }
  union.pr(pr) 
}
#loop through the indices...
A2 <- A * NA
for (i in 1:N){
for (j in 1:N){
  if (i!=j){
    A2[i,j] <- second_order_adjacency(A, i, j)
  }
}} 
diag(A2) <- 0 
1>   A2
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 0.000000 0.849976 0.666112 0.851572 0.314480
[2,] 0.699040 0.000000 0.492220 0.805520 0.831888
[3,] 0.885952 0.602192 0.000000 0.870464 0.790240
[4,] 0.187088 0.382128 0.362944 0.000000 0.749960
[5,] 0.954528 0.607608 0.440896 0.856736 0.000000

这个算法的规模类似于 N^2，我有数千个节点。而且我的矩阵并不是那么稀疏——很多小数字和几个大数字。我可以并行化它，但我只会除以核心数。是否有一些矢量化技巧可以让我利用矢量化操作的相对速度？

tl;dr：如何在概率有向图中快速计算二阶邻接矩阵？

【问题讨论】：

由于结构的原因，这必须按 N^2 缩放。我会用 1-prod(1-pr) 替换你的 union.pr 函数，我相信这会大大提高你的运行速度。

标签： r algorithm matrix vectorization adjacency-matrix

【解决方案1】：

您的 union.pr 函数比简单有效的方法慢 500 倍。因此，将您的 union.pr 替换为 1-prod(1-pr)，您将获得 500 倍的速度。

x <- runif(1000)*0.01

t1 <- proc.time()
for (i in 1:10000){
  y <- union.pr(x)
}
t1 <- proc.time()-t1
print(t1)
# user  system elapsed 
# 21.09    0.02   21.18 

t2 <- proc.time()
for (i in 1:10000){
  y <- 1-prod(1-x)
}
t2 <- proc.time() - t2
print(t2)
# user  system elapsed 
# 0.04    0.00    0.03

【讨论】：

【解决方案2】：

所以@Julius 的回答有助于提醒我一些基本的概率规则，但它并没有加快计算速度。但是，以下功能可以提供很多帮助：

second_order_adjacency2 <- function(A, i, j){#function to calculate probability that i is linked to j through some other node
  a1 <- A[i,1:nrow(A) %ni% c(i,j)]
  a2 <- t(A)[j,1:nrow(A) %ni% c(i,j)]
  1-prod(1-a1*a2)
}

它仍然像 N^2 一样缩放，因为它是一个循环，但在计算从 i 到 j 的各种路径时利用了矢量化。因此，它要快得多。

【讨论】：