【问题标题】:Handle Dictionary like String conversion to pyspark dataframe处理字典之类的字符串转换为 pyspark 数据帧
【发布时间】:2021-10-17 23:26:15
【问题描述】:

我是一个 pyspark 数据框

以下代码:-

 data=[[1,"sai","ram",'"color":"red","green","blue","flower":"rose","tulip"'],
    [2,"avi","kumar",'"color":"grey","black","white","flower":"roses","tulips"'] , 
[3,"ravi","prakash",'"color":"pink","cherry red","blue","flower":"rosey","tulipey"']
  ]
data_columns=["id","f_name","l_name","feature_stack"]
d=spark.createDataFrame(data=data,schema=data_columns)
d.show(truncate=False) 

我想将此字典类型字符串转换为数据框:

提前谢谢...

【问题讨论】:

  • 请不要将代码发布为图像,而是以文本形式发布。
  • 已注释和编辑!...请检查....

标签: python-3.x dataframe dictionary pyspark


【解决方案1】:

尝试使用 regexp_extract 函数从 feature_stack 数据中提取 color,flower

Example:

from pyspark.sql.functions import *

d.withColumn("color",regexp_extract(col("feature_stack"),'"color":(.*),"flower"',1)).\
withColumn("flower",regexp_extract(col("feature_stack"),'"flower":(.*)',1)).\
show(10,False)

【讨论】:

  • 它按预期工作......将进入下一步......
猜你喜欢
  • 2020-10-02
  • 2018-02-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多