【发布时间】:2020-09-15 16:50:37
【问题描述】:
我有一个具有以下架构的数据框。 translations --> languages (no, pt,...) 列下的 translation_version 字段位于 null 中。我想将所有translation_version 转换为字符串。我在translations下有17种语言
root
|-- translations: struct (nullable = true)
| |-- no: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true) // Want to cast as string
| |-- pt: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)
| |-- fr: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)
我尝试了df = df.na.fill('null'),但没有改变任何东西。还尝试使用以下代码进行投射
df = df.withColumn("translations", F.col("translations").cast("struct<struct<translation_version: string>>"))
但这返回了以下错误
pyspark.sql.utils.ParseException: u"\nmismatched input '<' expecting ':'(line 1, pos 13)\n\n== SQL ==\nstruct<struct<translation_version: string>>\n-------------^^^\n"
知道如何将所有translation_version 转换为每种语言的字符串吗?
【问题讨论】:
标签: apache-spark pyspark aws-glue