【发布时间】:2017-12-19 00:44:16
【问题描述】:
基本上,我已经稍微清理了我的数据集,删除了标题、错误值等。我现在正在尝试在其上训练一个随机森林分类器,以便它可以做出预测。我到目前为止:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.ml.classification.{RandomForestClassificationModel, RandomForestClassifier}
import org.apache.spark.ml.feature.StandardScaler
object{
def main(args: Array[String]): Unit = {
//setting spark context
val conf = new SparkConf().setAppName("Churn")
val sc = new SparkContext(conf)
//loading and mapping data into RDD
val csv = sc.textFile("file://filename.csv")
val data = csv.map {line =>
val parts = line.split(",").map(_.trim)
val stringvec = Array(parts(1)) ++ parts.slice(4,20)
val label = parts(20.toDouble)
val vec = stringvec.map(_.toDouble)
LabeledPoint(label, Vectors.dense(vec))
}
val splits = data.randomSplit(Array(0.7,0.3))
val(training, testing) = (splits(0),splits(1))
val model = RandomForest.trainClassifier(training)
}
}
但我收到如下错误:
error: overloaded method value trainClassifier with alternatives:
(input: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint],strategy: org.apache.spark.mllib.tree.configuration.Strategy,numTrees: Int,featureSubsetStrategy: String,seed: Int)org.apache.spark.mllib.tree.model.RandomForestModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint])
val model = RandomForest.trainClassifier(training)
谷歌搜索它让我无处可去。如果您能解释这个错误是什么以及我得到它的原因,我将不胜感激。然后我可以自己解决问题。
【问题讨论】:
标签: scala apache-spark