【发布时间】:2018-05-08 06:22:36
【问题描述】:
我已经编写了引用 DataframeGenerator example 的单元测试,它允许您动态生成模拟数据帧
成功执行以下命令后
sbt clean
sbt update
sbt compile
运行以下任一命令时,我得到输出中显示的错误
sbt assembly
sbt test -- -oF
输出
...
[info] SearchClicksProcessorTest:
17/11/24 14:19:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/24 14:19:07 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/11/24 14:19:18 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/11/24 14:19:18 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/11/24 14:19:19 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[info] - testExplodeMap *** FAILED ***
[info] ExceptionInInitializerError was thrown during property evaluation.
[info] Message: "None"
[info] Occurred when passed generated values (
[info]
[info] )
[info] - testFilterByClicks *** FAILED ***
[info] NoClassDefFoundError was thrown during property evaluation.
[info] Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info] Occurred when passed generated values (
[info]
[info] )
[info] - testGetClicksData *** FAILED ***
[info] NoClassDefFoundError was thrown during property evaluation.
[info] Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info] Occurred when passed generated values (
[info]
[info] )
...
[info] *** 3 TESTS FAILED ***
[error] Failed: Total 6, Failed 3, Errors 0, Passed 3
[error] Failed tests:
[error] com.company.spark.ml.pipelines.search.SearchClicksProcessorTest
[error] (root/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 73 s, completed 24 Nov, 2017 2:19:28 PM
我尝试失败的事情
- 使用 F 标志运行 sbt 测试以显示完整的堆栈跟踪(没有堆栈跟踪输出,如上所示)
- 在 IntelliJ Idea 中重新构建项目
我的问题是
- 此错误的可能原因是什么?
- 如何使 SBT 中的堆栈跟踪输出能够对其进行调试?
EDIT-1 我的单元测试类包含如下几种方法
class SearchClicksProcessorTest extends FunSuite with Checkers {
import spark.implicits._
test("testGetClicksData") {
val schemaIn = StructType(List(
StructField("rank", IntegerType),
StructField("city_id", IntegerType),
StructField("target", IntegerType)
))
val schemaOut = StructType(List(
StructField("clicked_res_rank", IntegerType),
StructField("city_id", IntegerType),
))
val dataFrameGen = DataframeGenerator.arbitraryDataFrame(spark.sqlContext, schemaIn)
val property = Prop.forAll(dataFrameGen.arbitrary) { dfIn: DataFrame =>
dfIn.cache()
val dfOut: DataFrame = dfIn.transform(SearchClicksProcessor.getClicksData)
dfIn.schema === schemaIn &&
dfOut.schema === schemaOut &&
dfIn.filter($"target" === 1).count() === dfOut.count()
}
check(property)
}
}
而build.sbt 看起来像这样
// core settings
organization := "com.company"
scalaVersion := "2.11.11"
name := "repo-name"
version := "0.0.1"
// cache options
offline := false
updateOptions := updateOptions.value.withCachedResolution(true)
// aggregate options
aggregate in assembly := false
aggregate in update := false
// fork options
fork in Test := true
//common libraryDependencies
libraryDependencies ++= Seq(
scalaTest,
typesafeConfig,
...
scalajHttp
)
libraryDependencies ++= allAwsDependencies
libraryDependencies ++= SparkDependencies.allSparkDependencies
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
...
case _ => MergeStrategy.first
}
lazy val module-1 = project in file("directory-1")
lazy val module-2 = (project in file("directory-2")).
dependsOn(module-1).
aggregate(module-1)
lazy val root = (project in file(".")).
dependsOn(module-2).
aggregate(module-2)
【问题讨论】:
-
看看this issue 并考虑解释那里提出的问题
-
您的测试构建文件和源代码是什么样的?
-
我的猜测是测试是并行执行的,每个测试都试图创建一个全新的
SparkSession,所以我会禁用并行测试执行 --> stackoverflow.com/q/11899723/1305344 -
看起来这个错误与@Holden 的 DataFrameGenerator 无关。 (尝试在没有它的情况下运行测试也会导致相同的错误)我已经将问题缩小到使用以下方法创建数据帧 spark.createDataFrame(rdd: RDD, schema: StructType) 特别是从示例 Seq(Row ) 需要 spark.parallelize 方法,我相信这会导致错误虽然我仍然无法克服这个错误,所以任何见解都会有所帮助..
-
我也尝试过@Jacek 的建议,即在没有运气的情况下禁用测试中的并行性
标签: scala apache-spark sbt scalatest scalacheck