将带有嵌套字典字典的 JSON 文件导入 Pandas答案

【问题标题】：Import JSON file with dicts of dicts of nested dicts into Pandas将带有嵌套字典字典的 JSON 文件导入 Pandas
【发布时间】：2021-09-25 02:12:52
【问题描述】：

我正在下载a json file from the RKI in Germany（相当于CDC）。它似乎有字典里面的字典里面的字典。我真的只对嵌套在“特征”字典中的数据字典感兴趣。我的问题是这个字典中的每个条目都嵌套了相同的键 - “属性”。这就是文本的样子（我必须使用文本，因为由于代理问题，我无法将其直接下载到 python 中 - grrr。）。

{"objectIdFieldName":"ObjectId","uniqueIdField":
{"name":"ObjectId","isSystemMaintained":true},
"globalIdFieldName":"","fields":
    [{"name":"AdmUnitId","type":"esriFieldTypeInteger","alias":"AdmUnitId","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null},
...etc...    {"name":"ObjectId","type":"esriFieldTypeOID","alias":"ObjectId","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null}],
"features":
    [{"attributes":{"AdmUnitId":0,"BundeslandId":0,"AnzFall":3741781,"AnzTodesfall":91337,"AnzFallNeu":1456,"AnzTodesfallNeu":18,"AnzFall7T":7178,"AnzGenesen":3638200,"AnzGenesenNeu":700,"AnzAktiv":12300,"AnzAktivNeu":700,"Inz7T":8.6,"ObjectId":1}},
    {"attributes":{"AdmUnitId":1,"BundeslandId":1,"AnzFall":64221,"AnzTodesfall":1628,"AnzFallNeu":35,"AnzTodesfallNeu":1,"AnzFall7T":181,"AnzGenesen":62300,"AnzGenesenNeu":0,"AnzAktiv":300,"AnzAktivNeu":0,"Inz7T":6.2,"ObjectId":2}},
    {"attributes":{"AdmUnitId":2,"BundeslandId":2,"AnzFall":77823,"AnzTodesfall":1603,"AnzFallNeu":50,"AnzTodesfallNeu":0,"AnzFall7T":217,"AnzGenesen":75700,"AnzGenesenNeu":0,"AnzAktiv":500,"AnzAktivNeu":0,"Inz7T":11.7,"ObjectId":3}},
    ...etc

当我尝试 pd.read_json(the_file) 时，我得到了 Value Error: arrays must be all the length.

如果我以 json 格式打开并加载，创建一个字典，我会得到我的字典和我想要的字典。我几乎可以到达那里，如下所示，但我最终得到一个嵌套字典列表，其中的键始终是 - “属性” - 这会引发错误。

with open(r"Q:\AbisF\Covid-19\Lageberichte\Misc\RKI_7Tages.json") as json_data:
    data = json.load(json_data)
# dig down to the data
features = data["features"]
attributes = features["attributes"]   # TypeError: list indices must be integers or slices, not str

我想知道我是不是走错了路，是否有办法清理我的列表（摆脱属性级别）。

【问题讨论】：

标签： python json pandas list dictionary

【解决方案1】：

我认为您的features = data["features"] 现在是dicts 中的list。

你可以遍历那些：

features = data["features"]
for feature in features:
    attributes = feature["attributes"]
    print(attributes['AdmUnitId'])  # example item in attributes

【讨论】：

好的，超级。然后我只需要转置结果并一次构建我的数据框。非常感谢。