【问题标题】:How to normalize json file containing a list (that should be kept as a list) in Python | Pandas?如何在 Python 中规范化包含列表(应保存为列表)的 json 文件熊猫?
【发布时间】:2021-03-30 14:59:42
【问题描述】:

我正在尝试使用 json_normalize 函数将 json 文件转换为数据帧。

源 JSON

  • json 是一个字典列表,看起来像这样:

    {
          "sport_key": "basketball_ncaab",
          "sport_nice": "NCAAB",
          "teams": [
              "Bryant Bulldogs",
              "Wagner Seahawks"
          ],
          "commence_time": 1608152400,
          "home_team": "Bryant Bulldogs",
          "sites": [
              {
                  "site_key": "marathonbet",
                  "site_nice": "Marathon Bet",
                  "last_update": 1608156452,
                  "odds": {
                      "h2h": [
                          1.28,
                          3.54
                      ]
                  }
              },
              {
                  "site_key": "sport888",
                  "site_nice": "888sport",
                  "last_update": 1608156452,
                  "odds": {
                      "h2h": [
                          1.13,
                          5.8
                      ]
                  }
              },
              {
                  "site_key": "unibet",
                  "site_nice": "Unibet",
                  "last_update": 1608156434,
                  "odds": {
                      "h2h": [
                          1.13,
                          5.8
                      ]
                  }
              }
          ],
          "sites_count": 3
      }
    

问题是未来的列之一包含一个列表(应该是这种情况),但是在 json_normalize 函数的元部分中包含此列会引发以下错误:

ValueError: operands could not be broadcast together with shape (22,) (11,)

当我尝试在以下代码的列表中添加“团队”时出现错误:

pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])

【问题讨论】:

    标签: python json pandas normalize


    【解决方案1】:

    假设data 是一个字典列表,您仍然可以使用json_normalize,但您必须为data 中的每个对应字典分别分配teams 列:

    def normalize(d):
        return pd.json_normalize(d, 'sites', ['sport_key', 'sport_nice', 'home_team'])\
               .assign(teams=[d['teams']]*len(d['sites']))
    
    
    df = pd.concat([normalize(d) for d in data], ignore_index=True)
    

    你也可以试试:

    data = [{**d, 'teams': ','.join(d['teams'])} for d in data]
    df = pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])
    df['teams'] = df['teams'].str.split(',')
    

    结果:

          site_key     site_nice  last_update      odds.h2h         sport_key sport_nice        home_team                               teams
    0  marathonbet  Marathon Bet   1608156452  [1.28, 3.54]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]
    1     sport888      888sport   1608156452   [1.13, 5.8]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]
    2       unibet        Unibet   1608156434   [1.13, 5.8]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]
    

    【讨论】:

    • 谢谢,第二个建议解决了我的问题!
    猜你喜欢
    • 2021-07-21
    • 2020-02-10
    • 2018-10-06
    • 2022-11-10
    • 2013-06-09
    • 2022-06-11
    • 2016-04-14
    • 2020-10-21
    • 2014-12-12
    相关资源
    最近更新 更多