【问题标题】:Converting JSON output to dataframe table in Python在 Python 中将 JSON 输出转换为数据框表
【发布时间】:2021-05-23 21:04:13
【问题描述】:

我正在使用 REST api 来获取 JSON 文件。我想将此 JSON 文件转换为数据框,以便将其上传到数据库。

JSON 的格式如下所示:

{result:
{__campaign_id__:
{"campaign_id": __campaign_id__ //id of the campaign in platform
  "name": xxx, // Campaign name
  "creatives":{ // list of creatives in the campaign
   __creative_id__:{
     "creative_id":__creative_id__,  // id of the creative
     "name":xxx, //name of the creative
     "device":xxx, // 0- mobile | 1- Desktop | 2- Instream video | 3- Responsive
     "width": x, //height of the placement in px
     "height": y, //w idth of the placement in px
     "analytics": {
       __live_id__:{
                "dt":xxx, //date in unix timestamp
                "impressions":xxx, //number of tracked ads
                "viewable":xxx,  // number of viewable impressions
                "engagement":xxx, // number of impressions user interacted or viewed video at least 25%
                "engagement_novtr":xxx, // number of impressions user interacted
                "ctr_unique":xxx, //number of unique clicks (one click per one impression)
                "ctr": xxx,  // number of total clicks
                "dwell":xxx, // exposure time
                "videos":[
                    {
                        "dt": 1607558400, //date in unix timestamp. Same as above
                        "unique": xxx, //number of unique video views
                        "id": "Video 1", //name of the video
                        "vtr_0": xxx,   // number of impressions who started watching video 1
                        "vtr_25": xxx,  // number of impressions who watched first quartile of video 1
                        "vtr_50": xxx,  // number of impressions who watched second quartile of video 1
                        "vtr_75": xxx,  // number of impressions who watched third quartile of video 1
                        "vtr_100": xxx  // number of impressions who finished watching the video of video 1
                    }
                ]
            }
        }
    },
   },
  },
}
}

我想要得到的只是数据是最精细的方式,所以像这样(简化):

campaignID   creativeID    device         analytics  
1           1            pc             1  
1           1            pc             2  
1           2            mobile         1 
1           2            pc             2  
2           4            pc             5  
2           4            mobile         6  
2           6            pc             7  
2           5            mobile         7   
3           8            pc             9    

等等。基本上只需将数据拆分为新行,以便每个新行代表数据的最佳拆分。如果这有意义的话。

我的请求看起来像

nexdReponse = requests.get("myURL", headers=call_headers)
json_nexdData = json.loads(nexdReponse.text)

现在我的 JSON 在具有 JSON 层次结构的字典中。所以我需要将嵌套字典的字典转换为数据框。

然后我尝试使用pd.Dataframe.from_dict(pd.json_normalize(myData) 规范化我的数据,但它没有达到我想要的效果。

有没有一种简单的方法或库可以用于此或其他什么?显然我对此有点陌生,所以只是想弄清楚它是如何工作的。

【问题讨论】:

    标签: python json dataframe rest


    【解决方案1】:

    你上传的json文件无效,所以我随机发布一个有效的

    {
        "status": {
            "code": 200,
            "message": "ok"
        },
        "pagination": {
            "page": 1,
            "count": 100,
            "total": 292
        },
        "products": [
        {
                "id": 143,
                "created_at": "2019-11-19T04:30:14.000Z",
                "updated_at": "2019-11-19T04:30:19.000Z",
                "blacklisted": false,
                "average_score": 4.76109,
                "total_reviews": 2051,
                "url": "https://go/kytatohbhi",
                "external_product_id": "123455",
                "name": "bolo azadi",
                "description": "kaisebhi",
                "product_specs": [
                    {
                        "name": "shradha anusar",
                        "value": "dhania"
                    }
                ],
                "category": {
                    "id": 1,
                    "name": "bandhkarobhai"
                },
                "products_group": {
                    "id": 3518659,
                    "display_name": "makkede"
                },
                "images": [
                    {
                        "original": "https://haule",
                        "square": "https://bulle",
                        "facebook": "https://sulle",
                        "facebook_square": "https://lulle",
                        "kind": "image"
                    }
                ]
            },
            {
                "id": 148,
                "created_at": "2019-11-19T04:30:14.000Z",
                "updated_at": "2019-11-19T04:30:19.000Z",
                "blacklisted": false,
                "average_score": 4.76109,
                "total_reviews": 2051,
                "url": "https://kytatohbhi",
                "external_product_id": "123455",
                "name": "kuch bhi....khuch bhi",
                "description": "kabhi alvida na kehna",
                "product_specs": [
                    {
                        "name": "shradha anusar",
                        "value": "dhania"
                    },
            {
                        "name": "swaad anusar",
                        "value": "namak"
                    }
                ],
                "category": {
                    "id": 1,
                    "name": "bandhkarobhai"
                },
                "products_group": {
                    "id": 3518659,
                    "display_name": "makkede"
                },
                "images": [
                    {
                        "original": "https://haule",
                        "square": "https://bulle",
                        "facebook": "https://sulle",
                        "facebook_square": "https://lulle",
                        "kind": "image"
                    }
                ]
            }
        ]
    }
    

    读取 json 后,您可以将其保存为数据帧, 请注意,键 products 是一个列表

    df = pd.json_normalize(data['products'])
    

    然后您可以使用explode 函数来拆分其余嵌套的列

    def explode_node(child_df, column_value):
        child_df = child_df.dropna(subset=[column_value])
        if isinstance(child_df[str(column_value)].iloc[0], str):
            child_df[column_value] = child_df[str(column_value)].apply(ast.literal_eval)
        expanded_child_df = (pd.concat({i: json_normalize(x) for i, x in child_df.pop(str(column_value)).items()}).reset_index(level=1,drop=True).join(child_df, how='right', lsuffix='_left', rsuffix='_right').reset_index(drop=True))
        expanded_child_df.columns = map(str.lower, expanded_child_df.columns)
        return expanded_child_df
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-11-20
      • 1970-01-01
      • 2017-10-31
      • 1970-01-01
      • 2021-09-27
      • 2018-04-18
      • 1970-01-01
      • 2021-07-14
      相关资源
      最近更新 更多