【发布时间】:2020-03-07 01:36:45
【问题描述】:
我有一个简单的 Spark 应用程序,但我终生无法运行输出 jar。
我只是运行mvn clean install 并使用java -jar SparkUdemy2-1.0-SNAPSHOT.jar 运行jar
下面我附上了maven文件和小代码sn-p。
我确保这些部门存在于我的本地 m2 中。怎么了?导入没有问题。
Maven
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>SparkUdemy2</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.5</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.0.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.2.0</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<archive>
<manifest>
<mainClass>com.learning.SparkMain</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
代码
package com.learning;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.Arrays;
public class SparkMain {
public static void main(String[] args) {
// Configure spark in local cluster - use all available cores available on machine
// Without this, the application would be running on a single thread
SparkConf conf = new SparkConf().setAppName("LearningSpark");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> initialRDD = sc.textFile("s3n://s3-spark-data-bucket/input.txt");
JavaPairRDD<Long, String> dat =
initialRDD
.map(sentence -> sentence.replaceAll("[^a-zA-Z\\s]", ""))
.filter(line -> line.trim().length() > 0)
.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1L))
.reduceByKey((v1, v2) -> v1 + v2)
.mapToPair(tuple -> new Tuple2<>(tuple._2, tuple._1))
.sortByKey(false);
dat.foreach(item -> System.out.println(item));
sc.close();
}
}
【问题讨论】:
-
spark 是一个分布式系统。我相信您的代码将在 Spark 中执行。你是怎么设置的?
标签: java maven apache-spark