【问题标题】:Not able to fetch values from Maptype Column in Pyspark无法从 Pyspark 中的 Maptype 列中获取值
【发布时间】:2021-08-19 20:02:38
【问题描述】:

我在数据框 Usage_Service_Union 中有以下值。

(Usage_Service_Union.filter("cvdcu3_event_d in ('onUsageTopic/Subject')") .select("cvdcu3_event_attr_x").show(5,truncate=False))

    |cvdcu3_event_attr_x                                                                                                                                                                                                                                                                                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{"page_id" -> "G2130854", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "236", "vehicletotaluptime" -> "223568", "_description" -> "Topic/Subject user accessed"}]                                                                                                                          |
|[{"page_id" -> "G2122100", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "126", "vehicletotaluptime" -> "81532", "_description" -> "Topic/Subject user accessed"}]                                                                                                                           |
|[{"page_id" -> "videos-page", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "97", "vehicletotaluptime" -> "48017", "_description" -> "Topic/Subject user accessed"}]                                                                                                                               |
|[{"page_id" -> "G2157430", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "126", "vehicletotaluptime" -> "81736", "_description" -> "Topic/Subject user accessed"}]  



**The column cvdcu3_event_attr_x is map column. 
|-- cvdcu3_event_attr_x: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)**  

   

如何获取 cvdcu3_event_attr_x 列下的 page_id 值。我尝试了以下解决方案,但它返回 null:

  1. (Usage_Service_Union.filter("cvdcu3_event_d in ('onUsageTopic/Subject')") .select("cvdcu3_event_attr_x.page_id").show(truncate=False))
  2. Usage_Service_Union.registerTempTable("Usage_Service_Union")
    spark.sql("select distinct cvdcu3_event_attr_x['page_id'] from Usage_Service_Union ").show(truncate=False)
  3. Usage_Service_Union.registerTempTable("Usage_Service_Union") spark.sql("从 Usage_Service_Union 中选择不同的 cvdcu3_event_attr_x.page_id ").show(truncate=False)

在上述所有情况下,输出如下:

+-------+
|page_id|
+-------+
|null   |
+-------+        

请帮忙!

另外附上对我不起作用的解决方案 1 的屏幕截图。

【问题讨论】:

  • 解决方案 1 适合我
  • 我在使用该解决方案时得到空值,而 page_id 没有空值

标签: python dataframe pyspark apache-spark-sql


【解决方案1】:

我得到了解决方案。原来键名是 {"page_id" 即键名包括大括号和双引号。以下对我有用:

(Usage_Service_Union.filter("cvdcu3_event_d in ('onUsageTopic/Subject')")
      .select(col("cvdcu3_event_attr_x").getItem("{\"page_id\"")).show(truncate=False))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-01-15
    • 1970-01-01
    • 2019-07-29
    • 1970-01-01
    • 2017-03-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多