1. 创建maven项目 在IDEA中添加scala插件 并添加scala的sdk
https://www.cnblogs.com/bajiaotai/p/15381309.html
2. 相关依赖jar的引入 配置pom.xml
2.1 pom.xml 示例 (spark版本: 3.0.0 scala版本: 2.12)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dxm.sparksql</groupId>
<artifactId>sparksql</artifactId>
<version>1.0-SNAPSHOT</version>
<!-- 指定变量 spark的版本信息 scala的版本信息-->
<properties>
<spark.version>3.0.0</spark.version>
<scala.version>2.12</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.27</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
</project>
2.2 spark版本与scala版本对应关系的问题
#根据下面链接 即可查询 spark版本和scala版本的对应关系及依赖配置
https://www.cnblogs.com/bajiaotai/p/16270971.html
2.3 在scala代码中查看运行时的scala版本
println(util.Properties.versionString)
2.4 FAQ 因Spark版本和Scala版本不一致导致的报错
3. 代码测试
object TestSparkSQLEnv extends App {
//1.初始化 SparkSession 对象
val spark = SparkSession
.builder
.master("local")
//.appName("SparkSql Entrance Class SparkSession")
//.config("spark.some.config.option", "some-value")
.getOrCreate()
//2.通过 SparkSession 获取 SparkContext
private val sc: SparkContext = spark.sparkContext
//3.设置日志级别
// Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
// This overrides any user-defined log settings //会覆盖掉 用户设置的日志级别 比如 log4j.properties
sc.setLogLevel("ERROR")
import spark.implicits._
//4.创建DataFream
private val rdd2DfByCaseClass: DataFrame = spark.sparkContext
.makeRDD(Array(Person("疫情", "何时"), Person("结束", "呢")))
.toDF("名称", "行动")
rdd2DfByCaseClass.show()
// +----+----+
// |名称|行动|
// +----+----+
// |疫情|何时|
// |结束| 呢|
// +----+----+
//5.关闭资源
spark.stop()
}
4. 结束语
如果能正常执行,恭喜你环境搭建没问题,如果遇到问题请留言共同探讨