假设您的“订阅参数”列是 ArrayType()。
from pyspark.sql import functions as F
from pyspark.sql import Row
from pyspark.context import SparkContext
# Call SparkContext
sc = SparkContext.getOrCreate()
sc = sparkContext
首先创建DataFrame
df = sc.createDataFrame([Row(Subscription_id=5516,
Subscription_parameters=["'catchupNotificationsEnabled': True",
"'newsNotificationsEnabled': True", "'autoDownloadsEnabled': False"])])
通过简单的索引将这个数组分成三列:
df = df.select("Subscription_id",
F.col("Subscription_parameters")[0].alias("catchupNotificationsEnabled"),
F.col("Subscription_parameters")[1].alias("newsNotificationsEnabled"),
F.col("Subscription_parameters")[2].alias("autoDownloadsEnabled"))
现在您的 DataFrame 已正确拆分,每个新列都包含一个字符串,例如“'catchupNotificationsEnabled': True”:
+---------------+---------------------------+------------------------+--------------------+
|Subscription_id|catchupNotificationsEnabled|newsNotificationsEnabled|autoDownloadsEnabled|
+---------------+---------------------------+------------------------+--------------------+
| 5516| 'catchupNotificat...| 'newsNotification...|'autoDownloadsEna...|
+---------------+---------------------------+------------------------+--------------------+
然后我建议通过检查它是否包含“True”来更新列值
df = df.withColumn('catchupNotificationsEnabled',
F.when(F.col("catchupNotificationsEnabled").contains("True"), True).otherwise(False))\
.withColumn('newsNotificationsEnabled',
F.when(F.col("newsNotificationsEnabled").contains("True"), True).otherwise(False))\
.withColumn('autoDownloadsEnabled',
F.when(F.col("autoDownloadsEnabled").contains("True"), True).otherwise(False))
生成的 DataFrame 符合预期
+---------------+---------------------------+------------------------+--------------------+
|Subscription_id|catchupNotificationsEnabled|newsNotificationsEnabled|autoDownloadsEnabled|
+---------------+---------------------------+------------------------+--------------------+
| 5516| true| true| false|
+---------------+---------------------------+------------------------+--------------------+
PS:如果该列不是 ArrayType() 的列,您可能需要稍微修改此代码。See this question for example