【问题标题】:How To Parse Nested API JSON Files如何解析嵌套的 API JSON 文件
【发布时间】:2021-08-17 00:26:16
【问题描述】:

我正在尝试从此 API 中提取数据片段,但一直难以理解如何执行此操作并执行此操作。我的最终目标是在数据中列出的每种燃料类型的饼图中显示英格兰地区不同的碳强度输出。由于数据是如此嵌套,我在访问特定键值和理解如何排序和提取特定部分,特别是“generationmix”键值方面遇到了麻烦。我已经尝试使用下面的代码使用 python pandas 显示我的数据,但它不会显示“generationmix”部分,并且整个表格看起来不完整。我查了一下,为了解析 JSON 数据的嵌套数据,可以使用 pandas json_normalize() 函数。为此,我不理解任何与函数相关的值,例如“记录路径”或“元”。

这是我正在使用的数据和我已经尝试过的代码的链接。

数据: https://api.carbonintensity.org.uk/regional/england

我的代码:

import requests
import pprint
import pandas as pd
import numpy

filename = ("https://api.carbonintensity.org.uk/regional/england")
r = requests.get(filename)
#print("Status Code:", r.status_code)


#Store API responce in a variable
responce_dict = r.json()
#print(responce_dict)

#Normalizing Data
multiple_level_data = pd.json_normalize(responce_dict, record_path = ["data"])
print(multiple_level_data)

【问题讨论】:

  • 您的预期输出是什么?你能发布你想要创建的DataFrame吗?

标签: python pandas api


【解决方案1】:

我不确定 json_normalize 是否可以帮助您处理这种类型的 json 响应,但我会这样做(好的,您可以优化 flatten 函数,但您明白了

import requests
import pandas as pd

response = requests.get('https://api.carbonintensity.org.uk/regional/england')

if not str(response.status_code).startswith('2'):
    print(f'Status code {response.status_code}, exiting')

response_data = response.json()

def flatten(d, sep='.'):
    cols = [
        'regionid', 'dnoregion', 'shortname',
        'from', 'to', 'intensity.forecast',
        'intensity.index', 'generationmix.fuel',
        'generationmix.perc'
    ]
    items = []
    for city in d.get('data', []):
        item_data = [
            city.get('regionid'), city.get('dnoregion'), city.get('shortname'),
            None, None, None, None, None, None
        ]
        city_items = []
        for city_data in city.get('data', []):
            item_data_tmp = item_data.copy()
            item_data_tmp[3] = city_data.get('from')
            item_data_tmp[4] = city_data.get('to')
            item_data_tmp[5] = city_data.get('intensity', {}).get('forecast')
            item_data_tmp[6] = city_data.get('intensity', {}).get('index')
            for generationmix in city_data.get('generationmix', []):
                item_data_tmp2 = item_data_tmp.copy()
                item_data_tmp2[7] = generationmix.get('fuel')
                item_data_tmp2[8] = generationmix.get('perc')
                city_items.append(item_data_tmp2)
            city_items = city_items or [item_data_tmp]
        items.extend(city_items or [item_data])
    return cols, items


cols, values = flatten(response_data)
df = pd.DataFrame(values, columns=cols)
print(df)

打印出来的

   regionid dnoregion shortname               from                 to  intensity.forecast intensity.index generationmix.fuel  generationmix.perc
0        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low            biomass                 6.3
1        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low               coal                 0.9
2        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low            imports                17.2
3        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low                gas                30.6
4        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low            nuclear                10.2
5        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low              other                 0.0
6        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low              hydro                 0.3
7        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low              solar                11.0
8        15   England   England  2021-08-16T15:30Z  2021-08-16T16:00Z                 152             low               wind                23.5

【讨论】:

    【解决方案2】:

    这假定列表只包含一个元素并且键始终存在。应该给你一个关于如何管理这个的线索:-

    import requests
    
    with requests.Session() as session:
        r = session.get('https://api.carbonintensity.org.uk/regional/england')
        r.raise_for_status()
        j = r.json()
        for d in j['data'][0]['data'][0]['generationmix']:
            print(d)
    

    这个输出将是:-

    {'fuel': 'biomass', 'perc': 6.3}
    {'燃料':'煤','perc':0.9}
    {'fuel': 'imports', 'perc': 17.2}
    {'fuel': 'gas', 'perc': 30.6}
    {'燃料':'核','perc':10.2}
    {'燃料':'其他','perc':0}
    {'fuel': 'hydro', 'perc': 0.3}
    {'燃料':'太阳能','perc':11}
    {'燃料':'风','perc':23.5}

    【讨论】:

      猜你喜欢
      • 2022-11-19
      • 2020-06-02
      • 2020-08-29
      • 2021-02-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多