为什么 `unlist(lapply)` 比 `sapply` 快？答案

【问题标题】：Why is `unlist(lapply)` faster than `sapply`?为什么 `unlist(lapply)` 比 `sapply` 快？
【发布时间】：2013-09-12 19:50:57
【问题描述】：

如果是这样，为什么我们需要sapply？

x <- list(a=1, b=1)
y <- list(a=1)
JSON <- rep(list(x,y),10000)
microbenchmark(sapply(JSON, function(x) x$a),
               unlist(lapply(JSON, function(x) x$a)),
               sapply(JSON, "[[", "a"),
               unlist(lapply(JSON, "[[", "a"))
               )

Unit: milliseconds
                                  expr      min       lq   median       uq      max neval
         sapply(JSON, function(x) x$a) 25.22623 28.55634 29.71373 31.76492 88.26514   100
 unlist(lapply(JSON, function(x) x$a)) 17.85278 20.25889 21.61575 22.67390 78.54801   100
               sapply(JSON, "[[", "a") 18.85529 20.06115 21.53790 23.42480 38.56610   100
       unlist(lapply(JSON, "[[", "a")) 11.33859 11.69198 12.25329 13.37008 27.81361   100

【问题讨论】：

可能是因为 sapply 调用 lapply。
我们不需要 sapply。它的存在纯粹是为了方便。
注意unlist(list(a=1:2)) 时名称会发生什么；对于unlist(..., use.names=FALSE)，它通常更快（有时明显如此）和更安全。

标签： r

【解决方案1】：

除了运行 lapply 之外，sapply 还运行 simplify2array 以尝试将输出放入数组中。要确定这是否可能，该函数需要检查所有单独的输出是否具有相同的长度：这是通过昂贵的unique(lapply(..., length)) 完成的，这占了您看到的大部分时间差异：

b <- lapply(JSON, "[[", "a")

microbenchmark(lapply(JSON, "[[", "a"),
               unlist(b),
               unique(lapply(b, length)),
               sapply(JSON, "[[", "a"),
               sapply(JSON, "[[", "a", simplify = FALSE),
               unlist(lapply(JSON, "[[", "a"))
)

# Unit: microseconds
#                                       expr       min        lq   median        uq       max neval
#                    lapply(JSON, "[[", "a") 14809.151 15384.358 15774.26 16905.226 24944.863   100
#                                  unlist(b)   920.047  1043.719  1158.62  1223.091  8056.231   100
#                  unique(lapply(b, length)) 10778.065 11060.452 11456.11 12581.414 19717.740   100
#                    sapply(JSON, "[[", "a") 24827.206 25685.535 26656.88 30519.556 93195.751   100
#  sapply(JSON, "[[", "a", simplify = FALSE) 14283.541 14922.780 15526.42 16654.058 26865.022   100
#            unlist(lapply(JSON, "[[", "a")) 15334.026 16133.146 16607.12 18476.182 30080.544   100

【讨论】：

【解决方案2】：

正如 droopy 和 Roland 所指出的，sapply 是为方便使用而设计的 lapply 的包装函数。 sapply 使用比unlist 慢的simplify2array：

> microbenchmark(unlist(as.list(1:1000)), simplify2array(as.list(1:1000)), times=1000)
Unit: microseconds
                            expr     min       lq  median       uq      max neval
         unlist(as.list(1:1000))  99.734 109.0230 113.912 118.3120 21343.92  1000
 simplify2array(as.list(1:1000)) 892.712 931.0895 947.957 976.3125 22241.52  1000

另外，当返回一个矩阵时，sapply 比其他基函数慢，例如：

a <- list(c(1,2,3,4), c(1,2,3,4), c(1,2,3,4))
microbenchmark(t(do.call(rbind, lapply(a, function(x)x))), sapply(a, function(x)x))
Unit: microseconds
                                        expr    min     lq median     uq     max neval
 t(do.call(rbind, lapply(a, function(x) x))) 29.823 30.801 32.512 33.734  94.845   100
                    sapply(a, function(x) x) 57.201 58.179 59.156 60.134 111.956   100

但尤其是在第二种情况下，sapply 更容易使用。

【讨论】：