【发布时间】:2018-12-05 08:47:23
【问题描述】:
当我使用 Spark 读取 CSV 文件并将其转换为数据集时,出现以下错误。我想不出原因。下面提供了我的代码。也可以http://eforexcel.com/wp/wp-content/uploads/2017/07/10000-Sales-Records.zip 下载 CSV 文件。
我正在使用 Scala:2.12.3,Spark:2.4.0。
错误信息:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`itemType`' given input columns: [Order ID, Total Profit, Country, Total Revenue, Ship Date, Unit Cost, Sales Channel, Unit Price, Total Cost, Units Sold, Order Date, Order Priority, Region, Item Type];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:110)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:107)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:278)
...
...
这是我的代码:
import spark.implicits._
case class Sales(region: String,
country: String,
itemType: String,
salesChannel: String,
orderPriority: String,
orderDate: String,
orderId: Long,
shipDate: String,
unitsSold: Integer,
unitsPrice: Double,
unitCost: Double,
totalRevenue: Double,
totalCost: Double,
totalProfit: Double
)
val ds = spark.read
.option("header", "true")
.option("inferSchema", "true")
.csv("src/main/resources/datasets/10000 Sales Records.csv")
.as[Sales]
【问题讨论】:
标签: scala csv apache-spark apache-spark-dataset apache-spark-2.0