【发布时间】:2019-05-11 11:55:08
【问题描述】:
我有一种情况,使用 split-apply-combine 可能会遇到运行时内存问题。任务是识别所有模拟中的共同元素。
numListFull <- replicate(1000, sample(1:55000, sample(54900:55000),
replace = FALSE))
format(object.size(numListFull), units = "auto", standard = "SI")
# [1] "66 MB"
# Create list of nums shared by all simulations
numListAll <- numListFull[[1]]
numList <- lapply(numListFull[2:length(numListFull)],
function(x){intersect(x, numListAll)})
format(object.size(numList), units = "auto", standard = "SI")
# [1] "65.7 MB"
numListAll <- Reduce(intersect, numList)
format(object.size(numListAll), units = "auto", standard = "SI")
# [1] "166.4 kB"
当复制从300 增加到1000 时,大小为219.9 MB、219.9 MB 和87.5 kB。
有时复制甚至会超过 10000 次,即后一种情况的 10 倍。您知道有什么更好的方法来避免计算机中的内存问题吗?
这样的事情合理吗?
numList <- lapply(split(2:length(numListFull), rep_len(1:100,length(numListFull))),
function(ind){
lapply(numListFull[ind],
function(x){
intersect(x, numListAll)})})
format(object.size(numList), units = "auto", standard = "SI")
# [1] "87.5 MB"
更新:当然 for 循环的工作原理就像没有内存问题的魅力,但是以并行化为代价!
【问题讨论】:
标签: r memory runtime apply lapply