在 R 中使用 mclapply 进行并行处理：函数不起作用答案

【问题标题】：Parallel processing in R with mclapply: function does not work在 R 中使用 mclapply 进行并行处理：函数不起作用
【发布时间】：2018-09-01 00:19:16
【问题描述】：

我有相当多的关键字，我需要将它们与更大的文档语料库进行比较并计算出现次数。

由于计算需要几个小时，我决定尝试并行处理。在这个论坛上，我找到了parallel包的mclapply功能，好像很有帮助。

作为 R 的新手，我无法让代码正常工作（请参阅下面的简短版本）。更具体地说，我得到了错误：

"get(as.character(FUN), mode = "function", envir = envir) 中的错误：找不到模式'function'的对象'FUN'”

rm(list=ls())

df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km")
keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159")

countstrings <- function(x){str_count(x, paste(sprintf("\\b%s\\b", keywords), collapse = '|'))}

# Normal way with one processor
number_of_keywords <- countstrings(df)
# Result: [1] 3 2 2

# Attempt at parallel processing
library(stringr)
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
number_of_keywords <- mclapply(cl, countstrings(df))
stopCluster(cl)
#Error in get(as.character(FUN), mode = "function", envir = envir) : 
#object 'FUN' of mode 'function' was not found

感谢任何帮助！

【问题讨论】：

你在 Windows 上吗？
是的，我在 Windows 上。
提示：无论何时尝试使用mclapply()、parLapply() 等，请从旧的lapply() 开始。当它起作用时，使用mclapply() 和朋友的障碍要小得多。（在这种情况下，您会遇到与 lapply() 相同的问题，并且知道这可能有助于您自己解决问题）

标签： r parallel-processing lapply

【解决方案1】：

这个函数应该更快。这是使用parSapply 使用并行处理的另一种方法（这将返回一个向量而不是列表）：

# function to count
count_strings <- function(x, words)
{
    sum(unlist(strsplit(x, ' ')) %in% words)
}

library(stringr)
library(parallel)
mcluster <- makecluster(detectCores()) # using all cores

number_of_keywords <- parSapply(mcluster, df, count_strings, keywords, USE.NAMES=F)

[1] 3 2 2

【讨论】：