【发布时间】:2017-03-03 04:19:08
【问题描述】:
我正在尝试将 csv 文件加载为 JavaRDD 字符串,然后想在 JavaRDD 向量中获取数据
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.feature.HashingTF;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;
import breeze.collection.mutable.SparseArray;
import scala.collection.immutable.Seq;
public class Trial {
public void start() throws InstantiationException, IllegalAccessException,
ClassNotFoundException {
run();
}
private void run(){
SparkConf conf = new SparkConf().setAppName("csvparser");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> data = jsc.textFile("C:/Users/kalraa2/Documents/trial.csv");
JavaRDD<Vector> datamain = data.flatMap(null);
MultivariateStatisticalSummary mat = Statistics.colStats(datamain.rdd());
System.out.println(mat.mean());
}
private List<Vector> Seq(Vector dv) {
// TODO Auto-generated method stub
return null;
}
public static void main(String[] args) throws Exception {
Trial trial = new Trial();
trial.start();
}
}
该程序运行时没有任何错误,但尝试在 spark-machine 上运行它时我什么也得不到。谁能告诉我字符串RDD到Vector RDD的转换是否正确。
我的 csv 文件只包含一列是浮点数
【问题讨论】:
标签: java apache-spark apache-spark-mllib