如何将包含嵌套字典的字典转换为 Python 中的数据框？答案

【问题标题】：How do I convert a dictionary that has nested dictionaries within it into a dataframe in Python?如何将包含嵌套字典的字典转换为 Python 中的数据框？
【发布时间】：2022-01-13 19:49:38
【问题描述】：

我最近在 Python 中使用 Oracle 的 AI 语言 API 进行了情绪分析。我让 API 迭代了 1300 条推文，并将 API 的输出存储在一个列表中，其中列表中的每个元素对应一个推文 ID。然后我创建了一个字典，其中键是推文 ID，值是 API 为该推文 ID 输出的结果。我现在有一个庞大的字典，字典嵌套在字典中，不知道如何将其转换为 Pandas 中的数据框。

这是我正在使用的字典的前几个条目。

 {1292750633104289792: {
   "aspects": []
 },
 1275918779831238656: {
   "aspects": []
 },
 1293251961031204865: {
   "aspects": [
     {
       "length": 8,
       "offset": 51,
       "scores": {
         "Negative": 0.18023298680782318,
         "Neutral": 0.0,
         "Positive": 0.8197670578956604
       },
       "sentiment": "Positive",
       "text": "building"
     }
   ]
 },
 1293312774563606531: {
   "aspects": []
 },
 1293375754751881217: {
   "aspects": [
     {
       "length": 4,
       "offset": 5,
       "scores": {
         "Negative": 0.9987309575080872,
         "Neutral": 0.0012690634466707706,
         "Positive": 0.0
       },
       "sentiment": "Negative",
       "text": "poll"
     }
   ]
 }}

非常感谢。

【问题讨论】：

标签： python pandas dataframe dictionary nested

【解决方案1】：

您可以使用嵌套推导来展平您的结构，然后将结果传递给pd.DataFrame：

import pandas as pd
r = [{'tweet_id':a, 
       'length':i['length'],
        'offset':i['offset'],
        **{f'score_{j}':k for j, k in i['scores'].items()},
        'sentiment':i['sentiment'],
        'text':i['text'],
     } 
     for a, b in data.items() for i in (b['aspects'] if isinstance(b, dict) else b.aspects)]

df = pd.DataFrame(r)

输出：

              tweet_id  length  offset  score_Negative  score_Neutral  score_Positive sentiment      text
0  1293251961031204865       8      51        0.180233       0.000000        0.819767  Positive  building
1  1293375754751881217       4       5        0.998731       0.001269        0.000000  Negative      poll

【讨论】：

我只是用我原来的字典试过这个，但由于某种原因它不起作用，不知道为什么。我收到此错误TypeError: 'DetectLanguageSentimentsResult' object is not subscriptable
@Jared 这可能是因为您的完整数据具有非标准类型的 aspects 值。您可以发布更大的数据样本吗？
这是我正在使用的字典的link，我将它保存为文本文件并上传到GitHub，这样更容易查看。
@Jared 在您的代码中，scores 键是指向字典还是 DetectLanguageSentimentsResult？基于错误，我认为是后者。如果是这样，那么代替{f'score_{j}':k for j, k in i['scores'].items()}，您可以执行{'score_Negative':i['scores'].negative, ....} 或类似的操作，通过属性访问得分值。
老实说，我不太确定，我不知道这个错误来自哪里。让我试试。其余代码将如何处理您的 ... ？像这样的东西？ {'score_Negative':i['scores'].negative, 'score_Positive':i['scores'].positive, 'score_Neutral':i['score'].netral}