火花选择并添加具有别名的列答案

【问题标题】：spark select and add columns with alias火花选择并添加具有别名的列
【发布时间】：2019-03-03 11:36:57
【问题描述】：

我想选择几列，添加几列或分割，将一些列作为空格填充，并用新名称作为别名存储它们。例如在 SQL 中应该是这样的：

select "   " as col1, b as b1, c+d as e from table

如何在 Spark 中实现这一点？

【问题讨论】：

标签： scala apache-spark hadoop bigdata

【解决方案1】：

您也可以使用本机 DF 函数。例如给出：

import org.apache.spark.sql.functions._
val df1 = Seq(
 ("A",1,5,3),
 ("B",3,4,2),
 ("C",4,6,3),
 ("D",5,9,1)).toDF("a","b","c","d")

选择列为：

df1.select(lit(" ").as("col1"),
           col("b").as("b1"),
           (col("c") + col("d")).as("e"))

给你预期的结果：

+----+---+---+
|col1| b1|  e|
+----+---+---+
|    |  1|  8|
|    |  3|  6|
|    |  4|  9|
|    |  5| 10|
+----+---+---+

【讨论】：

你能展示如何将字符串转换为int然后添加两列以及如何做((col1/col2)*100)
如果您有string 列，您可以使用col("a").cast("int") 转换为int。如果对您有帮助，请采纳答案。

【解决方案2】：

使用 Spark-SQL，您也可以这样做。

import org.apache.spark.sql.functions._
val df1 = Seq(
 ("A",1,5,3),
 ("B",3,4,2),
 ("C",4,6,3),
 ("D",5,9,1)).toDF("a","b","c","d")

df1.createOrReplaceTempView("table")
df1.show()

val df2 = spark.sql("select ' ' as col1, b as b1, c+d as e from table ").show()

输入：

    +---+---+---+---+
    |  a|  b|  c|  d|
    +---+---+---+---+
    |  A|  1|  5|  3|
    |  B|  3|  4|  2|
    |  C|  4|  6|  3|
    |  D|  5|  9|  1|
    +---+---+---+---+

输出：

+----+---+---+
|col1| b1|  e|
+----+---+---+
|    |  1|  8|
|    |  3|  6|
|    |  4|  9|
|    |  5| 10|
+----+---+---+

【讨论】：