连接数据框中包含 true 的列的名称

【问题标题】：Concatenate names of the columns in a dataframe which contain true连接数据框中包含 true 的列的名称
【发布时间】：2018-08-05 15:58:00
【问题描述】：

我有一个如下所示的数据框：

|id|type|isBlack|isHigh|isLong|  
|1 |A   |true   |false |null  |  
|2 |B   |true   |true  |true  |  
|3 |C   |false  |null  |null  |

我正在尝试将包含 'true' 的列的名称连接到另一列以获得此：

|id|type|isBlack|isHigh|isLong|Description          |
|1 |A   |true   |false |null  |isBlack              |
|2 |B   |true   |true  |true  |isBlack,isHigh,isLong|
|3 |C   |false  |null  |null  |null                 |

现在，我有一个预定义的列名列表，我需要检查（在这个例子中，它是 Seq("isBlack", "isHigh", "isLong"），它们存在于数据帧中（这个列表可能有点长）。

【问题讨论】：

标签： scala apache-spark apache-spark-sql

【解决方案1】：

val cols = Seq("isBlack", "isHigh", "isLong")

df.withColumn("description", concat_ws(",", cols.map(x => when(col(x), x)): _*)).show(false)
+---+----+-------+------+------+---------------------+
|id |type|isBlack|isHigh|isLong|description          |
+---+----+-------+------+------+---------------------+
|1  |A   |true   |false |null  |isBlack              |
|2  |B   |true   |true  |true  |isBlack,isHigh,isLong|
|3  |C   |false  |null  |null  |                     |
+---+----+-------+------+------+---------------------+

当值为true 时，首先将map 列转换为列名：

cols.map(x => when(col(x), x))

然后使用concat_ws 合并列concat_ws(",", cols.map(x => when(col(x), x)): _*)

【讨论】：