【问题标题】:how to convert json to a dataframe using json_normalize?如何使用 json_normalize 将 json 转换为数据帧?
【发布时间】:2019-08-31 22:31:02
【问题描述】:

我正在尝试将 api 响应从 json 转换为 pandas 中的数据框。我遇到的问题是 de 数据嵌套在 json 格式中,我没有在我的数据框中得到正确的列。

数据是从一个api收集的,格式如下:

{
    "data": [
        {
            "timestamp": "2019-04-10T11:40:13.437Z",
            "score": 87,
            "sensors": [
                {
                    "comp": "temp",
                    "value": 20.010000228881836
                },
                {
                    "comp": "humid",
                    "value": 34.4900016784668
                },
                {
                    "comp": "co2",
                    "value": 418
                },
                {
                    "comp": "voc",
                    "value": 166
                },
                {
                    "comp": "pm25",
                    "value": 4
                },
                {
                    "comp": "lux",
                    "value": 961.4000244140625
                },
                {
                    "comp": "spl_a",
                    "value": 45.70000076293945
                }
            ],
            "indices": [
                {
                    "comp": "temp",
                    "value": -1
                },
                {
                    "comp": "humid",
                    "value": -2
                },
                {
                    "comp": "co2",
                    "value": 0
                },
                {
                    "comp": "voc",
                    "value": 0
                },
                {
                    "comp": "pm25",
                    "value": 0
                }
            ]
        },
        {
            "timestamp": "2019-04-10T11:40:03.413Z",
            "score": 87,
            "sensors": [
                {
                    "comp": "temp",
                    "value": 20.040000915527344
                },
                {
                    "comp": "humid",
                    "value": 34.630001068115234
                },
                {
                    "comp": "co2",
                    "value": 418
                },
                {
                    "comp": "voc",
                    "value": 169
                },
                {
                    "comp": "pm25",
                    "value": 5
                },
                {
                    "comp": "lux",
                    "value": 960.2000122070312
                },
                {
                    "comp": "spl_a",
                    "value": 46
                }
            ],
            "indices": [
                {
                    "comp": "temp",
                    "value": -1
                },
                {
                    "comp": "humid",
                    "value": -1
                },
                {
                    "comp": "co2",
                    "value": 0
                },
                {
                    "comp": "voc",
                    "value": 0
                },
                {
                    "comp": "pm25",
                    "value": 0
                }
            ]
        },

如你所见编辑更多数据集

我已经尝试过以下内容:我已将 JSON 格式转换为 dict,然后使用 normalize 函数对其进行规范化。代码如下:

data = r.json() 

works_data = json_normalize(data=data['data'], record_path=['sensors'],meta=['timestamp'])

df = pd.DataFrame.from_dict(works_data)
print(df)

我得到的结果是:

 comp        value                 timestamp
0      temp    21.059999  2019-04-10T12:39:05.062Z
1     humid    31.250000  2019-04-10T12:39:05.062Z
2       co2   407.000000  2019-04-10T12:39:05.062Z
3       voc   136.000000  2019-04-10T12:39:05.062Z
4      pm25     3.000000  2019-04-10T12:39:05.062Z
5       lux  1302.099976  2019-04-10T12:39:05.062Z
6     spl_a    46.299999  2019-04-10T12:39:05.062Z

我需要的结果如下: the result

有人可以帮助我吗?

【问题讨论】:

    标签: python json pandas api normalize


    【解决方案1】:

    你可以重塑你的works_data:

    data = {
        "data": [
            {
                "timestamp": "2019-04-10T11:40:13.437Z",
                "score": 87,
                "sensors": [
                    {
                        "comp": "temp",
                        "value": 20.010000228881836
                    },
                    {
                        "comp": "humid",
                        "value": 34.4900016784668
                    },
                    {
                        "comp": "co2",
                        "value": 418
                    },
                    {
                        "comp": "voc",
                        "value": 166
                    },
                    {
                        "comp": "pm25",
                        "value": 4
                    },
                    {
                        "comp": "lux",
                        "value": 961.4000244140625
                    },
                    {
                        "comp": "spl_a",
                        "value": 45.70000076293945
                    }
                ],
                "indices": [
                    {
                        "comp": "temp",
                        "value": -1
                    },
                    {
                        "comp": "humid",
                        "value": -2
                    },
                    {
                        "comp": "co2",
                        "value": 0
                    },
                    {
                        "comp": "voc",
                        "value": 0
                    },
                    {
                        "comp": "pm25",
                        "value": 0
                    }
                ]
            },
            {
                "timestamp": "2019-04-10T11:40:03.413Z",
                "score": 87,
                "sensors": [
                    {
                        "comp": "temp",
                        "value": 20.040000915527344
                    },
                    {
                        "comp": "humid",
                        "value": 34.630001068115234
                    },
                    {
                        "comp": "co2",
                        "value": 418
                    },
                    {
                        "comp": "voc",
                        "value": 169
                    },
                    {
                        "comp": "pm25",
                        "value": 5
                    },
                    {
                        "comp": "lux",
                        "value": 960.2000122070312
                    },
                    {
                        "comp": "spl_a",
                        "value": 46
                    }
                ],
                "indices": [
                    {
                        "comp": "temp",
                        "value": -1
                    },
                    {
                        "comp": "humid",
                        "value": -1
                    },
                    {
                        "comp": "co2",
                        "value": 0
                    },
                    {
                        "comp": "voc",
                        "value": 0
                    },
                    {
                        "comp": "pm25",
                        "value": 0
                    }
                ]
            }]}
    
    
    from pandas.io.json import json_normalize
    import pandas as pd        
    
    df = pd.DataFrame()
    for each in data['data']:
        timestamp = each['timestamp']
        temp_df = json_normalize(data=each, record_path=['sensors']).T
    
        columns = list(temp_df.iloc[0])
        data_values = list(temp_df.iloc[1,:])
    
        temp_df = pd.DataFrame([data_values + [timestamp]], columns=columns + ['timestamp'])
    
        df = df.append(temp_df).reset_index(drop=True)
    
    
    
    print(df)
    

    输出:

    print(df)
            temp      humid    co2  ...         lux      spl_a                 timestamp
    0  20.010000  34.490002  418.0  ...  961.400024  45.700001  2019-04-10T11:40:13.437Z
    1  20.040001  34.630001  418.0  ...  960.200012  46.000000  2019-04-10T11:40:03.413Z
    
    [2 rows x 8 columns]
    

    【讨论】:

    • 这正是我需要的唯一的事情是我有超过 1 组这些 json 所以代码现在所做的是它从所有的集合列中生成所以我有很多列。但我想要的是它只制作一组列,然后添加更多行查看编辑的 json 数据以获得更好的样本
    • 那么每个时间戳的新行?
    • 伙伴你太棒了!这正是我所需要的
    猜你喜欢
    • 1970-01-01
    • 2020-10-15
    • 2017-05-14
    • 1970-01-01
    • 2020-07-31
    • 2020-09-06
    • 2019-09-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多