【发布时间】:2020-03-01 21:48:25
【问题描述】:
我正在连接两个数组列并将它们转换回数组。现在,当我应用爆炸时,什么都没有发生。使用 Spark 2.3。这里有什么奇怪的吗?
df = spark.createDataFrame([(1,25,['A','B','B','C'],['A','B','B','C']),(1,20,['A','A','B','C'],['A','B','B','C']),(1,20,['A','C','B','C'],['A','B','B','C']),(2,26,['X','Y','Z','C'],['A','B','B','C'])],['id','age','one','two'])
+---+---+------------+------------+
| id|age| one| two|
+---+---+------------+------------+
| 1| 25|[A, B, B, C]|[A, B, B, C]|
| 1| 20|[A, A, B, C]|[A, B, B, C]|
| 1| 20|[A, C, B, C]|[A, B, B, C]|
| 2| 26|[X, Y, Z, C]|[A, B, B, C]|
+---+---+------------+------------+
>>> df.createOrReplaceTempView('df')
>>> df2 = spark.sql('''select id,age, array(concat_ws(',', one, two)) as three from df''')
>>> df2.show()
+---+---+-----------------+
| id|age| three|
+---+---+-----------------+
| 1| 25|[A,B,B,C,A,B,B,C]|
| 1| 20|[A,A,B,C,A,B,B,C]|
| 1| 20|[A,C,B,C,A,B,B,C]|
| 2| 26|[X,Y,Z,C,A,B,B,C]|
+---+---+-----------------+
>>> df2.createOrReplaceTempView('df2')
>>> spark.sql('''select id, age, four from df2 lateral view explode(three) tbl as four''').show() //not exploding
+---+---+---------------+
| id|age| four|
+---+---+---------------+
| 1| 25|A,B,B,C,A,B,B,C|
| 1| 20|A,A,B,C,A,B,B,C|
| 1| 20|A,C,B,C,A,B,B,C|
| 2| 26|X,Y,Z,C,A,B,B,C|
+---+---+---------------+
请注意,我可以通过
>>> df2 = spark.sql('''select id,age, split(concat_ws(',', one, two),',') as three from df''')
但只是想知道为什么第一种方法不起作用。
【问题讨论】:
标签: pyspark