Spark-scala：withColumn 不是 Unit 的成员答案

【问题标题】：Spark-scala : withColumn is not a member of UnitSpark-scala：withColumn 不是 Unit 的成员
【发布时间】：2020-10-15 16:03:03
【问题描述】：

我正在尝试使用 spark df 在 spark 中读取 CSV 文件。该文件没有标题列，但我想要标题列。怎么做？不知道对不对，我写了这个命令 -> val df = spark.read.format("csv").load("/path/genchan1.txt").show()

并将列名作为 _c0 和 _c1 用于列。然后我尝试使用以下方法将列名更改为所需的名称： val df1 = df.withColumnRenamed("_c0","Series") ，但我得到 "withColumnRenamed" is not a member on单位。

PS：我已经导入了 spark.implicits._ 和 spark.sql.functions。

请帮助我知道是否有任何方法可以将列标题添加到数据集以及我遇到此问题的原因。

【问题讨论】：

标签： dataframe apache-spark apache-spark-sql

【解决方案1】：

show 的返回类型是 Unit。请从末尾删除show。

val df = spark.read.format("csv").load("/path/genchan1.txt")
df.show()

然后您可以使用所有 df 功能-

val df1 = df.withColumnRenamed("_c0","Series")

【讨论】：

非常感谢您的快速回复！ :)

【解决方案2】：

如果您事先知道 CSV 文件的结构，那么在加载数据的同时定义一个模式并将其附加到 df 是一个更好的解决方案。

快速参考示例代码 -

import org.apache.spark.sql.types._

val customSchema = StructType(Array(
  StructField("Series", StringType, true),
  StructField("Column2", StringType, true),
  StructField("Column3", IntegerType, true),
  StructField("Column4", DoubleType, true))
)

val df = spark.read.format("csv")
.option("header", "false") #since your file does not have header
.schema(customSchema)
.load("/path/genchan1.txt")

df.show()

【讨论】：