根据另一个矩阵改进矩阵操作的 for 循环的方法答案

【问题标题】：Ways to improve for loop for matrix manipulations depending on another matrix根据另一个矩阵改进矩阵操作的 for 循环的方法
【发布时间】：2016-02-06 15:04:22
【问题描述】：

我知道改进 for 循环之前已经被问过很多次了。我们可以应用族函数来改进 R 中的 for 循环。

但是，有没有一种方法可以改进对依赖于另一个矩阵的矩阵的操作？我的意思是，我在test 中设置为2 的元素是基于另一个矩阵index：

for (i in 1:nrow(test)){
  test[i,index[i,]]  <- 2
}    # where index is predetermined matrix

另一个例子是，我根据另一个矩阵anyMatrix 的行中元素的顺序设置test 中的值：

for (i in 1:nrow(test)){
   test[i,] <- order(anyMatrix[i,])
}

我可以在这里使用 lapply 或 sapply，但它们会返回一个列表，并且将其转换回矩阵需要相同的时间。

可重现的例子：

test <- matrix(0, nrow = 10, ncol = 10)
set.seed(1234)
index <- matrix(sample.int(10, 10*10, TRUE), 10, 10)
anyMatrix <- matrix(rnorm(10*10), nrow = 10, ncol = 10)

for (i in 1:nrow(test)){
  test[i,index[i,]]  <- 2
}

for (i in 1:nrow(test)){
   test[i,] <- order(anyMatrix[i,])
}

【问题讨论】：

标签： r matrix lapply mapply

【解决方案1】：

您确实在这里遇到了两个不同的问题。

问题 1：给定一个矩阵 index，如果 j 出现在行 i的index。这可以通过简单的矩阵索引来完成，传递一个 2 列的索引矩阵，其中第一列是您要索引的所有元素的行，第二列是您要索引的所有元素的列：

test[cbind(as.vector(row(index)), as.vector(index))] <- 2
test
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,]    2    2    0    2    2    2    2    0    2     2
#  [2,]    2    0    2    2    2    2    2    0    2     2
#  [3,]    2    2    2    2    0    0    2    2    0     0
#  [4,]    2    2    0    0    0    2    2    2    0     2
#  [5,]    2    2    2    2    0    0    0    0    2     0
#  [6,]    0    0    0    0    0    2    2    2    2     0
#  [7,]    2    0    2    2    2    2    2    0    0     0
#  [8,]    2    0    2    2    2    2    0    2    0     2
#  [9,]    2    2    2    2    0    0    2    0    2     2
# [10,]    2    0    2    0    0    2    2    2    2     0

由于这在单个矢量化操作中完成所有操作，因此它应该比遍历行并单独处理它们更快。这是一个有 100 万行和 10 列的示例：

OP <- function(test, index) {
  for (i in 1:nrow(test)){
    test[i,index[i,]]  <- 2
  }
  test
}
josliber <- function(test, index) {
  test[cbind(as.vector(row(index)), as.vector(index))] <- 2
  test
}
test.big <- matrix(0, nrow = 1000000, ncol = 10)
set.seed(1234)
index.big <- matrix(sample.int(10, 1000000*10, TRUE), 1000000, 10)
identical(OP(test.big, index.big), josliber(test.big, index.big))
# [1] TRUE
system.time(OP(test.big, index.big))
#    user  system elapsed 
#   1.564   0.014   1.591 
system.time(josliber(test.big, index.big))
#    user  system elapsed 
#   0.408   0.034   0.444

在这里，矢量化方法快 3.5 倍。

问题2：您想将test 的i 行设置为order 应用于anyMatrix 的相应行。你可以通过apply 做到这一点：

(test <- t(apply(anyMatrix, 1, order)))
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,]    1   10    7    8    4    5    3    6    2     9
#  [2,]    8    7    1    6    3    4    9    5   10     2
#  [3,]    4    9    7    1    3    2    6   10    5     8
#  [4,]    1    2    6    4   10    3    9    8    7     5
#  [5,]    9    6    5    1    2    7   10    4    8     3
#  [6,]    9    3    8    6    5   10    1    4    7     2
#  [7,]    3    7    2    5    6    8    9    4    1    10
#  [8,]    9    8    1    3    4    6    7   10    5     2
#  [9,]    8    4    3    6   10    7    9    5    2     1
# [10,]    4    1    9    3    6    7    8    2   10     5

我不希望这里的运行时有太大变化，因为apply 实际上只是循环遍历行，类似于您在解决方案中循环的方式。尽管如此，我还是更喜欢这种解决方案，因为它减少了打字，并且使用了更多“R”的做事方式。

请注意，这两个应用程序使用了完全不同的代码，这在 R 数据操作中非常典型——有很多不同的专用运算符，您需要选择适合您任务的一个。我认为没有一个函数或什至真正的一小组函数能够处理所有矩阵操作，其中该操作基于来自另一个矩阵的数据。

【讨论】：

谢谢，但是 cbind 在第一个中如何更快？ cbind 不会比通常的 for 循环花费更多时间吗？你有基准吗？
@rmania 我已更新此答案以包含一个基准，该基准表明矢量化索引操作与循环替代方案相比产生了加速。在 R 中，将许多重复的快速操作替换为一起执行所有这些操作的单个操作通常会产生巨大的加速。