在 Python 中将 JSON 输出转换为数据框表答案

【问题标题】：Converting JSON output to dataframe table in Python在 Python 中将 JSON 输出转换为数据框表
【发布时间】：2021-05-23 21:04:13
【问题描述】：

我正在使用 REST api 来获取 JSON 文件。我想将此 JSON 文件转换为数据框，以便将其上传到数据库。

JSON 的格式如下所示：

{result:
{__campaign_id__:
{"campaign_id": __campaign_id__ //id of the campaign in platform
  "name": xxx, // Campaign name
  "creatives":{ // list of creatives in the campaign
   __creative_id__:{
     "creative_id":__creative_id__,  // id of the creative
     "name":xxx, //name of the creative
     "device":xxx, // 0- mobile | 1- Desktop | 2- Instream video | 3- Responsive
     "width": x, //height of the placement in px
     "height": y, //w idth of the placement in px
     "analytics": {
       __live_id__:{
                "dt":xxx, //date in unix timestamp
                "impressions":xxx, //number of tracked ads
                "viewable":xxx,  // number of viewable impressions
                "engagement":xxx, // number of impressions user interacted or viewed video at least 25%
                "engagement_novtr":xxx, // number of impressions user interacted
                "ctr_unique":xxx, //number of unique clicks (one click per one impression)
                "ctr": xxx,  // number of total clicks
                "dwell":xxx, // exposure time
                "videos":[
                    {
                        "dt": 1607558400, //date in unix timestamp. Same as above
                        "unique": xxx, //number of unique video views
                        "id": "Video 1", //name of the video
                        "vtr_0": xxx,   // number of impressions who started watching video 1
                        "vtr_25": xxx,  // number of impressions who watched first quartile of video 1
                        "vtr_50": xxx,  // number of impressions who watched second quartile of video 1
                        "vtr_75": xxx,  // number of impressions who watched third quartile of video 1
                        "vtr_100": xxx  // number of impressions who finished watching the video of video 1
                    }
                ]
            }
        }
    },
   },
  },
}
}

我想要得到的只是数据是最精细的方式，所以像这样（简化）：

campaignID   creativeID    device         analytics  
1           1            pc             1  
1           1            pc             2  
1           2            mobile         1 
1           2            pc             2  
2           4            pc             5  
2           4            mobile         6  
2           6            pc             7  
2           5            mobile         7   
3           8            pc             9

等等。基本上只需将数据拆分为新行，以便每个新行代表数据的最佳拆分。如果这有意义的话。

我的请求看起来像

nexdReponse = requests.get("myURL", headers=call_headers)
json_nexdData = json.loads(nexdReponse.text)

现在我的 JSON 在具有 JSON 层次结构的字典中。所以我需要将嵌套字典的字典转换为数据框。

然后我尝试使用pd.Dataframe.from_dict(pd.json_normalize(myData) 规范化我的数据，但它没有达到我想要的效果。

有没有一种简单的方法或库可以用于此或其他什么？显然我对此有点陌生，所以只是想弄清楚它是如何工作的。

【问题讨论】：

标签： python json dataframe rest

【解决方案1】：

你上传的json文件无效，所以我随机发布一个有效的

{
    "status": {
        "code": 200,
        "message": "ok"
    },
    "pagination": {
        "page": 1,
        "count": 100,
        "total": 292
    },
    "products": [
    {
            "id": 143,
            "created_at": "2019-11-19T04:30:14.000Z",
            "updated_at": "2019-11-19T04:30:19.000Z",
            "blacklisted": false,
            "average_score": 4.76109,
            "total_reviews": 2051,
            "url": "https://go/kytatohbhi",
            "external_product_id": "123455",
            "name": "bolo azadi",
            "description": "kaisebhi",
            "product_specs": [
                {
                    "name": "shradha anusar",
                    "value": "dhania"
                }
            ],
            "category": {
                "id": 1,
                "name": "bandhkarobhai"
            },
            "products_group": {
                "id": 3518659,
                "display_name": "makkede"
            },
            "images": [
                {
                    "original": "https://haule",
                    "square": "https://bulle",
                    "facebook": "https://sulle",
                    "facebook_square": "https://lulle",
                    "kind": "image"
                }
            ]
        },
        {
            "id": 148,
            "created_at": "2019-11-19T04:30:14.000Z",
            "updated_at": "2019-11-19T04:30:19.000Z",
            "blacklisted": false,
            "average_score": 4.76109,
            "total_reviews": 2051,
            "url": "https://kytatohbhi",
            "external_product_id": "123455",
            "name": "kuch bhi....khuch bhi",
            "description": "kabhi alvida na kehna",
            "product_specs": [
                {
                    "name": "shradha anusar",
                    "value": "dhania"
                },
        {
                    "name": "swaad anusar",
                    "value": "namak"
                }
            ],
            "category": {
                "id": 1,
                "name": "bandhkarobhai"
            },
            "products_group": {
                "id": 3518659,
                "display_name": "makkede"
            },
            "images": [
                {
                    "original": "https://haule",
                    "square": "https://bulle",
                    "facebook": "https://sulle",
                    "facebook_square": "https://lulle",
                    "kind": "image"
                }
            ]
        }
    ]
}

读取 json 后，您可以将其保存为数据帧，请注意，键 products 是一个列表

df = pd.json_normalize(data['products'])

然后您可以使用explode 函数来拆分其余嵌套的列

def explode_node(child_df, column_value):
    child_df = child_df.dropna(subset=[column_value])
    if isinstance(child_df[str(column_value)].iloc[0], str):
        child_df[column_value] = child_df[str(column_value)].apply(ast.literal_eval)
    expanded_child_df = (pd.concat({i: json_normalize(x) for i, x in child_df.pop(str(column_value)).items()}).reset_index(level=1,drop=True).join(child_df, how='right', lsuffix='_left', rsuffix='_right').reset_index(drop=True))
    expanded_child_df.columns = map(str.lower, expanded_child_df.columns)
    return expanded_child_df

【讨论】：