【发布时间】:2019-03-03 11:36:57
【问题描述】:
我想选择几列,添加几列或分割,将一些列作为空格填充,并用新名称作为别名存储它们。例如在 SQL 中应该是这样的:
select " " as col1, b as b1, c+d as e from table
如何在 Spark 中实现这一点?
【问题讨论】:
标签: scala apache-spark hadoop bigdata
我想选择几列,添加几列或分割,将一些列作为空格填充,并用新名称作为别名存储它们。例如在 SQL 中应该是这样的:
select " " as col1, b as b1, c+d as e from table
如何在 Spark 中实现这一点?
【问题讨论】:
标签: scala apache-spark hadoop bigdata
您也可以使用本机 DF 函数。例如给出:
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
选择列为:
df1.select(lit(" ").as("col1"),
col("b").as("b1"),
(col("c") + col("d")).as("e"))
给你预期的结果:
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+
【讨论】:
string 列,您可以使用col("a").cast("int") 转换为int。如果对您有帮助,请采纳答案。
使用 Spark-SQL,您也可以这样做。
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
df1.createOrReplaceTempView("table")
df1.show()
val df2 = spark.sql("select ' ' as col1, b as b1, c+d as e from table ").show()
输入:
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| A| 1| 5| 3|
| B| 3| 4| 2|
| C| 4| 6| 3|
| D| 5| 9| 1|
+---+---+---+---+
输出:
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+
【讨论】: