【发布时间】:2016-08-09 00:32:24
【问题描述】:
我有一个这样的 DataFrame:
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| brand| diesel| e10| e5| houseNumber| id| isOpen| lat| lng| name| place| postCode| street| Datum|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|[TOTAL, ARAL, She...|[1.049, 1.029, 1....|[1.249, 1.209, 1....|[1.269, 1.229, 1....|[49, 12-14, , , ...|[4409a024-b190-4b...|[true, true, true...|[50.93128, 50.952...|[6.962356, 6.9616...|[TOTAL KOELN, Ara...|[KOELN, Köln, KOE...|[50676, 50668, 50...|[HOLZMARKT, Riehl...|2016-08-01 10:50:...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
基本上所有的列都是数组。它基于嵌套的 JSON 数据。
我试图把它炸开。但这仅适用于 select 语句中的一列。你知道为什么我可以一次解压pyspark 中的所有值以保持关系吗?
【问题讨论】:
标签: apache-spark dataframe pyspark