【发布时间】:2019-04-23 13:58:52
【问题描述】:
我的 spark 集群在独立模式下运行。
我在使用 spark-submit 将 spring-boot 应用程序部署到 spark 集群时遇到了这个错误:
我在 spark/jars 中删除了几个与我的 spring-boot jar 不兼容的 jar,例如 gson 和 servlet-api。
Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.10.10.53, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1417)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2293)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
...
我的命令:
bin/spark-submit \
--master spark://localhost:7077 \
path_to_jar/xxx.jar
我的 build.gradle:
dependencies {
compile fileTree(dir: 'libs', include: ['*.jar'])
compile('org.springframework.boot:spring-boot-starter-web:2.1.3.RELEASE'){
exclude module: 'logback-classic'
exclude module: 'slf4j-log4j12'
}
compile('org.springframework.boot:spring-boot-starter-thymeleaf:2.1.3.RELEASE'){
exclude module: 'logback-classic'
exclude module: 'slf4j-log4j12'
}
compile('org.springframework.boot:spring-boot-configuration-processor:2.1.3.RELEASE')
compile('com.google.code.gson:gson:2.8.5')
compileOnly(group: 'org.apache.hadoop', name: 'hadoop-common', version: '2.7.7'){
exclude module: 'servlet-api'
}
compileOnly(group: 'org.apache.spark', name: 'spark-core_2.12', version: '2.4.0')
compileOnly(group: 'org.apache.spark', name: 'spark-mllib_2.12', version: '2.4.0')
}
SparkContext 在 spring-boot 应用程序中自动装配。
SparkContextBean.java
@Configuration
public class SparkContextBean {
@Autowired
private SparkProperties sparkProperties;
@Bean
@ConditionalOnMissingBean(SparkConf.class)
public SparkConf sparkConf(){
SparkConf conf = new SparkConf().setAppName(sparkProperties.getAppname());
return conf;
}
@Bean
@ConditionalOnMissingBean(JavaSparkContext.class)
public JavaSparkContext javaSparkContext() throws Exception {
return new JavaSparkContext(sparkConf());
}
}
火花代码:
//hsidata is a JavaPairRDD<Integer, short[][]> value
Tuple2<double[], double[]> mk = hsidata.mapToPair(pair -> {
short[][] data = pair._2;
return JTool.CalcMK(data);
}).reduce((right, left) -> {
double[] mean = right._1;
int bands = mean.length;
double[] K = right._2;
int n = bands * (bands + 1) / 2;
for (int i = 0; i < bands; i++)
mean[i] = mean[i] + left._1[i];
for (int i = 0; i < n; i++)
K[i] = K[i] + left._2[i];
return new Tuple2<>(mean, K);
});
【问题讨论】:
-
请提供示例 Spark 应用程序代码以帮助解决此问题
-
代码已更新,感谢提醒:)
标签: spring spring-boot apache-spark gradle