【发布时间】:2020-12-15 14:59:14
【问题描述】:
我有一个 Pandas 数据框,其中一列包含 JSON 数据(JSON 结构很简单:只有一层,没有嵌套数据):
ID,Date,attributes
9001,2020-07-01T00:00:06Z,"{"State":"FL","Source":"Android","Request":"0.001"}"
9002,2020-07-01T00:00:33Z,"{"State":"NY","Source":"Android","Request":"0.001"}"
9003,2020-07-01T00:07:19Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
9004,2020-07-01T00:11:30Z,"{"State":"NY","Source":"windows","Request":"0.001"}"
9005,2020-07-01T00:15:23Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
我想规范化 attributes 列中的 JSON 内容,以便 JSON 属性成为数据框中的每个列。
ID,Date,attributes.State, attributes.Source, attributes.Request
9001,2020-07-01T00:00:06Z,FL,Android,0.001
9002,2020-07-01T00:00:33Z,NY,Android,0.001
9003,2020-07-01T00:07:19Z,FL,ios,0.001
9004,2020-07-01T00:11:30Z,NY,windows,0.001
9005,2020-07-01T00:15:23Z,FL,ios,0.001
我一直在尝试使用需要字典的Pandas json_normalize。所以,我想我会将 attributes 列转换为字典,但它并没有达到预期的效果,因为字典的格式如下:
df.attributes.to_dict()
{0: '{"State":"FL","Source":"Android","Request":"0.001"}',
1: '{"State":"NY","Source":"Android","Request":"0.001"}',
2: '{"State":"FL","Source":"ios","Request":"0.001"}',
3: '{"State":"NY","Source":"windows","Request":"0.001"}',
4: '{"State":"FL","Source":"ios","Request":"0.001"}'}
并且规范化采用键 (0, 1, 2, ...) 作为列名,而不是 JSON 键。
我感觉我已经很接近了,但我不知道如何准确地做到这一点。欢迎任何想法。
谢谢!
【问题讨论】:
标签: python json pandas dataframe normalize