【问题标题】:how to flattening nested json to dataframe pandas如何将嵌套的 json 展平为数据框 pandas
【发布时间】:2020-03-11 05:46:55
【问题描述】:

如何像这样将 JSON 展平为 pd.dataframe:

class_id|id |schedule_id |schedule_date |lesson_price |status`
    1   | 3 |    1       | 2017-07-11   |   USD 25    | ONGOING
    1   | 3 |    2       | 2016-09-24   |   USD 15    | OPEN REGISTRATION
    1   | 4 |    1       | 2016-12-17   |   USD 19    | ONGOING
    1   | 4 |    2       | 2015-11-12   |   USD 29    | ONGOING
    1   | 4 |    3       | 2015-11-10   |   USD 14    | ON SCHEDULE
    2   | 1 |    1       | 2017-05-21   |   USD 50    | CANCELLED
    2   | 2 |    1       | 2017-06-04   |   USD10     | FINISHED
    2   | 2 |    2       | 2018-03-01   |   USD12     | CLOSED

来自JSON

我已经从 reference 尝试过,但我给了我 2 行 groupby class_id

如何使用课程对象中的 class_id 和 id 显示所有数据计划,如所需的数据框?

【问题讨论】:

    标签: python json python-3.x pandas dataframe


    【解决方案1】:

    你的数据结构的难点来自于

    {
      "lesson3": {
        "id": 3,
        "schedule": [
          {
            "schedule_id": "1",
            "schedule_date": "2017-07-11",
            "lesson_price": "USD 25",
            "status": "ONGOING"
          },
          {
            "schedule_id": "2",
            "schedule_date": "2016-09-24",
            "lesson_price": "USD 15",
            "status": "OPEN REGISTRATION"
          }
        ]
      }
    }
    

    最好有

    {
      "name": "lesson3",
      "id": 3,
      "schedule": [
        {
          "schedule_id": "1",
          "schedule_date": "2017-07-11",
          "lesson_price": "USD 25",
          "status": "ONGOING"
        },
        {
          "schedule_id": "2",
          "schedule_date": "2016-09-24",
          "lesson_price": "USD 15",
          "status": "OPEN REGISTRATION"
        }
      ]
    }
    

    但我们无法控制大部分时间获得的数据。所以我们必须去掉第1课、第2课的键,然后向上移动对象。

    解决方案

    import requests
    data = requests.get(url).json()
    

    提取不同的教训

    data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]
    

    数据现在是这样的

    [
      {
        "class_id": "1",
        "lessons": {
          "id": 3,
          "schedule": [
            {
              "schedule_id": "1",
              "schedule_date": "2017-07-11",
              "lesson_price": "USD 25",
              "status": "ONGOING"
            },
            {
              "schedule_id": "2",
              "schedule_date": "2016-09-24",
              "lesson_price": "USD 15",
              "status": "OPEN REGISTRATION"
            }
          ]
        }
      },
      ...
    ]
    

    现在我们可以使用json_normalize将其读入pandas DataFrame

    df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])
    

    输出

      schedule_id schedule_date lesson_price             status class_id lessons.id
    0           1    2017-07-11       USD 25            ONGOING        1          3
    1           2    2016-09-24       USD 15  OPEN REGISTRATION        1          3
    2           1    2016-12-17       USD 19            ONGOING        1          4
    3           2    2015-11-12       USD 29            ONGOING        1          4
    4           3    2015-11-10       USD 14        ON SCHEDULE        1          4
    5           1    2017-05-21       USD 50          CANCELLED        2          1
    6           1    2017-06-04        USD10           FINISHED        2          2
    7           5    2018-03-01        USD12             CLOSED        2          2
    

    【讨论】:

      猜你喜欢
      • 2021-04-27
      • 2020-09-27
      • 1970-01-01
      • 2023-03-07
      • 2021-05-14
      • 2021-01-18
      • 2022-06-21
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多