【发布时间】:2020-05-18 22:49:03
【问题描述】:
我正在 pyspark 中查看以下 DataFrame 架构(为保护隐私而更改了名称)。
|-- some_data: struct (nullable = true)
| |-- some_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_nested_array: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- some_param_1: long (nullable = true)
| | | | | |-- some_param_2: string (nullable = true)
| | | | | |-- some_param_3: string (nullable = true)
| | | |-- some_param_4: string (nullable = true)
| | | |-- some_param_5: string (nullable = true)
| |-- some_other_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_param_6: string (nullable = true)
| | | |-- some_param_7: string (nullable = true)
| |-- yet_another_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_param_8: string (nullable = true)
| | | |-- some_param_9: string (nullable = true)
我正在努力在双重嵌套数组上使用explode 函数。理想情况下,我希望以某种方式访问some_array 下方的参数,以便我可以比较 some_param_1 到 9 - 甚至只是 some_param_1 到 5。
【问题讨论】:
-
你能展示你的代码以及你遇到了什么错误吗?
标签: apache-spark pyspark apache-spark-sql