【发布时间】:2021-08-19 20:02:38
【问题描述】:
我在数据框 Usage_Service_Union 中有以下值。
(Usage_Service_Union.filter("cvdcu3_event_d in ('onUsageTopic/Subject')") .select("cvdcu3_event_attr_x").show(5,truncate=False))
|cvdcu3_event_attr_x |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{"page_id" -> "G2130854", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "236", "vehicletotaluptime" -> "223568", "_description" -> "Topic/Subject user accessed"}] |
|[{"page_id" -> "G2122100", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "126", "vehicletotaluptime" -> "81532", "_description" -> "Topic/Subject user accessed"}] |
|[{"page_id" -> "videos-page", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "97", "vehicletotaluptime" -> "48017", "_description" -> "Topic/Subject user accessed"}] |
|[{"page_id" -> "G2157430", "view" -> "FEATURES/DalmFeaturesView", "ignitionCount" -> "126", "vehicletotaluptime" -> "81736", "_description" -> "Topic/Subject user accessed"}]
**The column cvdcu3_event_attr_x is map column.
|-- cvdcu3_event_attr_x: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)**
如何获取 cvdcu3_event_attr_x 列下的 page_id 值。我尝试了以下解决方案,但它返回 null:
- (Usage_Service_Union.filter("cvdcu3_event_d in ('onUsageTopic/Subject')") .select("cvdcu3_event_attr_x.page_id").show(truncate=False))
- Usage_Service_Union.registerTempTable("Usage_Service_Union")
spark.sql("select distinct cvdcu3_event_attr_x['page_id'] from Usage_Service_Union ").show(truncate=False) - Usage_Service_Union.registerTempTable("Usage_Service_Union") spark.sql("从 Usage_Service_Union 中选择不同的 cvdcu3_event_attr_x.page_id ").show(truncate=False)
在上述所有情况下,输出如下:
+-------+
|page_id|
+-------+
|null |
+-------+
请帮忙!
【问题讨论】:
-
解决方案 1 适合我
-
我在使用该解决方案时得到空值,而 page_id 没有空值
标签: python dataframe pyspark apache-spark-sql