错误：找不到函数“includePackage”答案

【问题标题】：Error: could not find function "includePackage"错误：找不到函数“includePackage”
【发布时间】：2016-05-20 09:45:23
【问题描述】：

我正在尝试在安装了 Spark 1.5.1 的 SparkR 上执行随机森林算法。我不清楚，为什么我会收到错误 -

  Error: could not find function "includePackage"

此外，即使我在代码中使用 mapPartitions 函数，我也会收到错误消息 -

  Error: could not find function "mapPartitions"

请找到以下代码：

rdd <- SparkR:::textFile(sc, "http://localhost:50070/explorer.html#/Datasets/Datasets/iris.csv",5) 

includePackage(sc,randomForest) 

rf <- mapPartitions(rdd, function(input) {
 ## my function code for RF
}

【问题讨论】：

标签： r apache-spark machine-learning sparkr

【解决方案1】：

这更像是一个评论和一个交叉问题，而不是一个答案（由于声誉原因不允许评论）但只是为了更进一步，如果我们使用 collect 方法将 rdd 转换回 R 数据帧, 是不是数据太大了反而会适得其反，在 R 中执行会花费太长时间。

这是否也意味着我们可以使用任何 R 包，比如 markovChain 或使用相同方法的神经网络。

【讨论】：

【解决方案2】：

请检查sparkRhttp://spark.apache.org/docs/latest/api/R/index.html中可以使用的功能这不包括函数mapPartitions() 或includePackage()

#For reading csv in sparkR

sparkRdf <- read.df(sqlContext, "./nycflights13.csv", 
                    "com.databricks.spark.csv", header="true")

#Possible way to use `randomForest` is to convert the `sparkR` data frame to `R` data frame
Rdf <- collect(sparkRdf) 

#compute as usual in `R` code
>install.packages("randomForest") 
>library(rainForest)
......
#convert back to sparkRdf 
sparkRdf <- createDataFrame(sqlContext, Rdf)

【讨论】：

对不起，我的意思是随机森林