从熊猫数据框中的字典列表中提取值答案

【问题标题】：Extracting values from dictionary list in pandas dataframe从熊猫数据框中的字典列表中提取值
【发布时间】：2022-01-23 18:29:50
【问题描述】：

我有以下熊猫数据框：

pd.DataFrame({'keys': {3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition', 31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType', 50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand', 70: 'productType'}, 'values': {3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'}, 14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925, 20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'}, 29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'}, 31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33: {'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'}, 41: 'chinese-laundry', 44: 'tshirts', 50: 925, 52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'}, 61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'}, 63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry', 70: 'jackets'}})

看起来像这样：

    keys    values
3   brandId 925
5   price   {'currencyName': 'GBP', 'priceAmount': '50.00'...
14  sizes   {'id': 4, 'name': 'UK 4', 'quantity': 1}
18  brandId 925
20  price   {'currencyName': 'GBP', 'priceAmount': '11.00'...
29  sizes   {'id': 3, 'name': 'S', 'quantity': 1}
30  condition   {'id': 'used_like_new', 'name': 'Like new'}
...

我想为属于其键的特定值展平字典。例如，在任何其他字典键中仅获取来自priceAmount 的值，以及来自name 的值。

所以预期的输出：

    keys           values
3   brandId        925
5   price          50.00
14  sizes          UK 4
18  brandId        925
20  price          11.00
29  sizes          S
30  condition      Like new}

我可以使用以下内容来做到这一点，如果我要替换更多内容，这将花费很长时间！


price_data = []
for price in data[data['keys'].str.contains('price', na=False)].values:
    price_data.append(price[1]['priceAmount'])
    
condition_data = []
for condition in data[data['keys'].str.contains('condition', na=False)].values:
    condition_data.append(condition[1]['name'])
    
age_data = []
for age in data[data['keys'].str.contains('age', na=False)].values:
    age_data.append(age[1]['name'])
    
sizes_data = []
for sizes in data[data['keys'].str.contains('sizes', na=False)].values:
    sizes_data.append(sizes[1]['name'])

colour_data = []
for colour in data[data['keys'].str.contains('colour', na=False)].values:
    colour_data.append(colour[1]['name'])

#replace the values
data=data.replace(data[data['keys'].str.contains('price', na=False)]['values'].values, price_data) 
data=data.replace(data[data['keys'].str.contains('condition', na=False)]['values'].values, condition_data) 
data=data.replace(data[data['keys'].str.contains('age', na=False)]['values'].values, age_data) 
data=data.replace(data[data['keys'].str.contains('sizes', na=False)]['values'].values, sizes_data) 
data=data.replace(data[data['keys'].str.contains('colour', na=False)]['values'].values, colour_data)

有没有更快更流畅的替代方案？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

另一种选择是使用简单的列表推导：

df['values'] = [i.get('priceAmount') or i.get('name') if isinstance(i, dict) else i for i in df['values'].tolist()]

输出：

           keys           values
3       brandId              925
5         price            50.00
14        sizes             UK 4
18      brandId              925
20        price            11.00
29        sizes                S
30    condition         Like new
31    condition        Brand new
32       colour            Multi
33          age           Modern
36        brand  chinese-laundry
40       colour            White
41        brand  chinese-laundry
44  productType          tshirts
50      brandId              925
52        price            20.00
61        sizes                M
62    condition        Brand new
63       colour            Black
64          age              90s
67        brand  chinese-laundry
70  productType          jackets

【讨论】：

【解决方案2】：

也许，如果您可以访问用于制作df 的dict，则可以改用json_normalize()。

例如：

d = {
    'keys': {
        3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition',
        31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType',
        50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand',
        70: 'productType',
    },
    'values': {
        3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'},
        14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925,
        20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'},
        29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'},
        31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33:
        {'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'},
        41: 'chinese-laundry', 44: 'tshirts', 50: 925,
        52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'},
        61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'},
        63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry',
        70: 'jackets',
    },
}

请注意，这个字典有点不寻常，keys 和 values 在顶层分开。为了将它们放在一起，以便可以使用json_normalize()，我们希望将它们合并，以便所有记录都是完整的（键和值）。这会分解出每个数字键。请注意，由于我假设值中有许多记录（可能是字典列表？），因此您必须对它们中的每一个都执行此操作。

>>> mod_d = {d['keys'][i]: v for i, v in d['values'].items()}
>>> mod_d
{'brandId': 925,
 'price': {'currencyName': 'GBP',
  'priceAmount': '20.00',
  'nationalShippingCost': '3.00'},
 'sizes': {'id': 11, 'name': 'M', 'quantity': 1},
 'condition': {'id': 'brand_new', 'name': 'Brand new'},
 'colour': {'id': 'black', 'name': 'Black'},
 'age': {'id': '90s', 'name': '90s'},
 'brand': 'chinese-laundry',
 'productType': 'jackets'}

有了这个，我们现在可以使用json_normalize()：

>>> df = pd.json_normalize(mod_d)
>>> df
   brandId            brand productType price.currencyName price.priceAmount  \
0      925  chinese-laundry     jackets                GBP             20.00   

  price.nationalShippingCost  sizes.id sizes.name  sizes.quantity  \
0                       3.00        11          M               1   

  condition.id condition.name colour.id colour.name age.id age.name  
0    brand_new      Brand new     black       Black    90s      90s  
,

【讨论】：

这看起来很有趣！虽然我不明白这部分：d['keys'][i]: v?这里发生了什么，因为它似乎产生了正确的输出，即使您没有从 priceAmount 和 name 中专门选择来获取属于这些的值？
是的，我走得有点快。查看修改后的答案。
另外，您的数据更可能的表示是只有一个keys dict 和values dicts 的列表，对吧？例如：{'keys': {...}, 'values': [{...}, {...}, ...]}。只是猜测。

【解决方案3】：

pandas 字符串方法允许访问列表/元组/字典中的值：

df['val'] = np.where(df['keys'] == 'price', 
                     df['values'].str['priceAmount'], 
                     df['values'].str['name'])

df['val'] = df['val'].fillna(df['values'])

 keys                                             values              val
3       brandId                                                925              925
5         price  {'currencyName': 'GBP', 'priceAmount': '50.00'...            50.00
14        sizes           {'id': 4, 'name': 'UK 4', 'quantity': 1}             UK 4
18      brandId                                                925              925
20        price  {'currencyName': 'GBP', 'priceAmount': '11.00'...            11.00
29        sizes              {'id': 3, 'name': 'S', 'quantity': 1}                S
30    condition        {'id': 'used_like_new', 'name': 'Like new'}         Like new
31    condition           {'id': 'brand_new', 'name': 'Brand new'}        Brand new
32       colour                   {'id': 'multi', 'name': 'Multi'}            Multi
33          age                 {'id': 'modern', 'name': 'Modern'}           Modern
36        brand                                    chinese-laundry  chinese-laundry
40       colour                   {'id': 'white', 'name': 'White'}            White
41        brand                                    chinese-laundry  chinese-laundry
44  productType                                            tshirts          tshirts
50      brandId                                                925              925
52        price  {'currencyName': 'GBP', 'priceAmount': '20.00'...            20.00
61        sizes             {'id': 11, 'name': 'M', 'quantity': 1}                M
62    condition           {'id': 'brand_new', 'name': 'Brand new'}        Brand new
63       colour                   {'id': 'black', 'name': 'Black'}            Black
64          age                       {'id': '90s', 'name': '90s'}              90s
67        brand                                    chinese-laundry  chinese-laundry
70  productType                                            jackets          jackets

【讨论】：

为什么我们有df['keys'] == 'price'？
您在问题中陈述了选择；I want to flatten out the dictionary for specific values belonging to their key. For example, grab only the value from priceAmount, and the value from name in any other dictionary key.。 priceAmount 似乎只与 keys 列中的 price 相关联
感谢您的解释！我查看了进一步巩固这种方法的文档。