【发布时间】:2023-01-03 15:22:42
【问题描述】:
我有如下数据框:
+-----------------------------------------------------------------------------------------------+-----------------------+
|value |timestamp |
+-----------------------------------------------------------------------------------------------+-----------------------+
|{"after":{"id":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas@acme.com"}}|2023-01-03 11:02:11.975|
|{"after":{"id":1002,"first_name":"George","last_name":"Bailey","email":"gbailey@foobar.com"}} |2023-01-03 11:02:11.976|
|{"after":{"id":1003,"first_name":"Edward","last_name":"Walker","email":"ed@walker.com"}} |2023-01-03 11:02:11.976|
|{"after":{"id":1004,"first_name":"Anne","last_name":"Kretchmar","email":"annek@noanswer.org"}} |2023-01-03 11:02:11.976|
+-----------------------------------------------------------------------------------------------+-----------------------+
root
|-- value: string (nullable = true)
|-- timestamp: timestamp (nullable = true)
使用 pyspark 的预期结果:
+---------+-------------+-------------+-----------------------+
id | first_name | last_name | email |
+---------+-------------+-------------+-----------------------+
1001 | Sally | Thomas | sally.thomas@acme.com |
1002 | George | Bailey | gbailey@foobar.com |
1003 | Edward | Walker | ed@walker.com |
1004 | Anne | Kretchmar | annek@noanswer.org |
任何帮助表示赞赏
【问题讨论】:
标签: python apache-spark pyspark