【发布时间】:2021-01-06 18:15:06
【问题描述】:
我有一个使用 Spark 实现的简单 java 并行算法。但我不确定如何在 Google Dataproc 集群上运行它。我在网上找到了很多使用 python 或 scala 的资源,但对于 java 来说还不够。这是代码
public class Prime {
List<Integer> primes = new ArrayList<>();
//Method to calculate and count the prime numbers
public void countPrime(int n){
for (int i = 2; i < n; i++){
boolean isPrime = true;
//check if the number is prime or not
for (int j = 2; j < i; j++){
if (i % j == 0){
isPrime = false;
break; // exit the inner for loop
}
}
//add the primes into the List
if (isPrime){
primes.add(i);
}
}
}
//Main method to run the program
public static void main(String[]args){
//creating javaSparkContext object
SparkConf conf = new SparkConf().setAppName("haha").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
//new prime object
Prime prime = new Prime();
prime.countPrime(100000);
//parallelize the collection
JavaRDD<Integer> rdd = sc.parallelize(prime.primes , 4);
long count = rdd.filter(e -> e == 2|| e % 2 != 0).count();
}
}
【问题讨论】:
标签: java parallel-processing google-cloud-dataproc