【问题标题】:extract data from a complicated data structure in python从python中复杂的数据结构中提取数据
【发布时间】:2025-12-28 09:20:08
【问题描述】:

我有一个像

这样的数据结构
[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

它是一个包含许多字典的列表,每个字典有 3 对 'uid': 'test_subject145', 'class':'?', 'data':[]。 在最后一对 'data' 中,该值是一个列表,它再次包含一个字典,其中包含 2 对 'chunk':1, 'writing':[],在 'writing ',它的值是一个列表,又包含许多列表。 我要提取的是所有这些句子的内容,如'this is exciting''you are good' 等,然后放入一个简单的列表中。其最终形式应为 list_final = ['this is exciting', 'you are good', 'he died',... ]

【问题讨论】:

标签: python list dictionary extraction


【解决方案1】:

鉴于您的原始列表名为 input,只需使用列表理解:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]

您可以使用.get(..,()),这样如果没有这样的键,它仍然有效:如果没有这样的键,我们返回空元组(),因此没有迭代。

根据您的示例输入,我们得到:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

【讨论】:

  • ++ 使用 .get(..., ())
【解决方案2】:

tl;博士

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

慢慢来,一次做一层。然后重构您的代码以使其更小。

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

然后要转换成更小(但难以阅读)的东西,像这样重写上面的代码。应该很容易看出如何从一个到另一个:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

然后,用像威廉的回答这样更好的名字让它更漂亮:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

【讨论】:

    【解决方案3】:

    所以我相信下面会起作用

    lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
              {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]
    
    list_of_final_products = []
    
    for itema in lista:
      try:
        for data_item in itema['data']:
          for writa in data_item['writing']:
            for writa_itema in writa:
              list_of_final_products.append(writa)
      except:
        pass
    

    上面提到的这个项目,我相信对理解有帮助 - python getting a list of value from list of dict(感谢 McGrady)

    【讨论】:

    • 注意写的元素也是列表...'data':中的元素也是...
    • 已添加。谢谢 - 我没看到
    • 我认为它仍然无效,因为itema['data'] 本身就是一个列表。因此,您需要对其进行迭代,而不是获取密钥。
    • 我认为现在它是有效的,尽管建议不要使用except 毯子。 +1。
    • 好吧,在这种情况下没关系。但是说你做类似.append(some_function(x))some_function 的事情会引发一些奇怪的错误,你并不总是想抓住那个(在那个地方)。所以很多软件工程师建议永远捕获所有异常,只捕获一个明确声明的列表。