【问题标题】:Explode array into columns Spark将数组分解成列 Spark
【发布时间】:2019-06-14 09:24:23
【问题描述】:

嗨1,我有一个像下面这样的json:

{meta:{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}

我想获得以下数据框:

+---------------+----+---------------+
|              1|   2| 5             |
+---------------+----+---------------+
|  Aged 35 to 49|Male|  Aged 15 to 17|
+---------------+----+---------------+   

我怎么能在 pyspark 中做到这一点?
谢谢

【问题讨论】:

    标签: apache-spark pyspark apache-spark-sql explode


    【解决方案1】:

    可以使用get_json_object()函数解析json列:

    示例:

    df=spark.createDataFrame([Row(jsn='{"meta":{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}')])
    
    df.selectExpr("get_json_object(jsn,'$.meta.clusters[0].1') as `1`",
    "get_json_object(jsn,'$.meta.clusters[*].2') as `2`",
    "get_json_object(jsn,'$.meta.clusters[*].5') as `5`").show(10,False)
    

    “输出”:

    +-------------+------+---------------+
    |1            |2     |5              |
    +-------------+------+---------------+
    |Aged 35 to 49|"Male"|"Aged 15 to 17"|
    +-------------+------+---------------+
    

    【讨论】:

      猜你喜欢
      • 2020-05-09
      • 2018-05-13
      • 1970-01-01
      • 1970-01-01
      • 2021-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多