【问题标题】:Extracting values from dictionary list in pandas dataframe从熊猫数据框中的字典列表中提取值
【发布时间】:2022-01-23 18:29:50
【问题描述】:

我有以下熊猫数据框:

pd.DataFrame({'keys': {3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition', 31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType', 50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand', 70: 'productType'}, 'values': {3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'}, 14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925, 20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'}, 29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'}, 31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33: {'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'}, 41: 'chinese-laundry', 44: 'tshirts', 50: 925, 52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'}, 61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'}, 63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry', 70: 'jackets'}})

看起来像这样:

    keys    values
3   brandId 925
5   price   {'currencyName': 'GBP', 'priceAmount': '50.00'...
14  sizes   {'id': 4, 'name': 'UK 4', 'quantity': 1}
18  brandId 925
20  price   {'currencyName': 'GBP', 'priceAmount': '11.00'...
29  sizes   {'id': 3, 'name': 'S', 'quantity': 1}
30  condition   {'id': 'used_like_new', 'name': 'Like new'}
...

我想为属于其键的特定值展平字典。例如,在任何其他字典键中仅获取来自priceAmount 的值,以及来自name 的值。

所以预期的输出:

    keys           values
3   brandId        925
5   price          50.00
14  sizes          UK 4
18  brandId        925
20  price          11.00
29  sizes          S
30  condition      Like new}

我可以使用以下内容来做到这一点,如果我要替换更多内容,这将花费很长时间!


price_data = []
for price in data[data['keys'].str.contains('price', na=False)].values:
    price_data.append(price[1]['priceAmount'])
    
condition_data = []
for condition in data[data['keys'].str.contains('condition', na=False)].values:
    condition_data.append(condition[1]['name'])
    
age_data = []
for age in data[data['keys'].str.contains('age', na=False)].values:
    age_data.append(age[1]['name'])
    
sizes_data = []
for sizes in data[data['keys'].str.contains('sizes', na=False)].values:
    sizes_data.append(sizes[1]['name'])

colour_data = []
for colour in data[data['keys'].str.contains('colour', na=False)].values:
    colour_data.append(colour[1]['name'])

#replace the values
data=data.replace(data[data['keys'].str.contains('price', na=False)]['values'].values, price_data) 
data=data.replace(data[data['keys'].str.contains('condition', na=False)]['values'].values, condition_data) 
data=data.replace(data[data['keys'].str.contains('age', na=False)]['values'].values, age_data) 
data=data.replace(data[data['keys'].str.contains('sizes', na=False)]['values'].values, sizes_data) 
data=data.replace(data[data['keys'].str.contains('colour', na=False)]['values'].values, colour_data) 

有没有更快更流畅的替代方案?

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    另一种选择是使用简单的列表推导:

    df['values'] = [i.get('priceAmount') or i.get('name') if isinstance(i, dict) else i for i in df['values'].tolist()]
    

    输出:

               keys           values
    3       brandId              925
    5         price            50.00
    14        sizes             UK 4
    18      brandId              925
    20        price            11.00
    29        sizes                S
    30    condition         Like new
    31    condition        Brand new
    32       colour            Multi
    33          age           Modern
    36        brand  chinese-laundry
    40       colour            White
    41        brand  chinese-laundry
    44  productType          tshirts
    50      brandId              925
    52        price            20.00
    61        sizes                M
    62    condition        Brand new
    63       colour            Black
    64          age              90s
    67        brand  chinese-laundry
    70  productType          jackets
    

    【讨论】:

      【解决方案2】:

      也许,如果您可以访问用于制作dfdict,则可以改用json_normalize()

      例如:

      d = {
          'keys': {
              3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition',
              31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType',
              50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand',
              70: 'productType',
          },
          'values': {
              3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'},
              14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925,
              20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'},
              29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'},
              31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33:
              {'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'},
              41: 'chinese-laundry', 44: 'tshirts', 50: 925,
              52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'},
              61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'},
              63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry',
              70: 'jackets',
          },
      }
      

      请注意,这个字典有点不寻常,keysvalues 在顶层分开。为了将它们放在一起,以便可以使用json_normalize(),我们希望将它们合并,以便所有记录都是完整的(键和值)。这会分解出每个数字键。请注意,由于我假设值中有许多记录(可能是字典列表?),因此您必须对它们中的每一个都执行此操作。

      >>> mod_d = {d['keys'][i]: v for i, v in d['values'].items()}
      >>> mod_d
      {'brandId': 925,
       'price': {'currencyName': 'GBP',
        'priceAmount': '20.00',
        'nationalShippingCost': '3.00'},
       'sizes': {'id': 11, 'name': 'M', 'quantity': 1},
       'condition': {'id': 'brand_new', 'name': 'Brand new'},
       'colour': {'id': 'black', 'name': 'Black'},
       'age': {'id': '90s', 'name': '90s'},
       'brand': 'chinese-laundry',
       'productType': 'jackets'}
      

      有了这个,我们现在可以使用json_normalize()

      >>> df = pd.json_normalize(mod_d)
      >>> df
         brandId            brand productType price.currencyName price.priceAmount  \
      0      925  chinese-laundry     jackets                GBP             20.00   
      
        price.nationalShippingCost  sizes.id sizes.name  sizes.quantity  \
      0                       3.00        11          M               1   
      
        condition.id condition.name colour.id colour.name age.id age.name  
      0    brand_new      Brand new     black       Black    90s      90s  
      , 
      

      【讨论】:

      • 这看起来很有趣!虽然我不明白这部分:d['keys'][i]: v?这里发生了什么,因为它似乎产生了正确的输出,即使您没有从 priceAmountname 中专门选择来获取属于这些的值?
      • 是的,我走得有点快。查看修改后的答案。
      • 另外,您的数据更可能的表示是只有一个keys dict 和values dicts 的列表,对吧?例如:{'keys': {...}, 'values': [{...}, {...}, ...]}。只是猜测。
      【解决方案3】:

      pandas 字符串方法允许访问列表/元组/字典中的值:

      df['val'] = np.where(df['keys'] == 'price', 
                           df['values'].str['priceAmount'], 
                           df['values'].str['name'])
      
      df['val'] = df['val'].fillna(df['values'])
      
       keys                                             values              val
      3       brandId                                                925              925
      5         price  {'currencyName': 'GBP', 'priceAmount': '50.00'...            50.00
      14        sizes           {'id': 4, 'name': 'UK 4', 'quantity': 1}             UK 4
      18      brandId                                                925              925
      20        price  {'currencyName': 'GBP', 'priceAmount': '11.00'...            11.00
      29        sizes              {'id': 3, 'name': 'S', 'quantity': 1}                S
      30    condition        {'id': 'used_like_new', 'name': 'Like new'}         Like new
      31    condition           {'id': 'brand_new', 'name': 'Brand new'}        Brand new
      32       colour                   {'id': 'multi', 'name': 'Multi'}            Multi
      33          age                 {'id': 'modern', 'name': 'Modern'}           Modern
      36        brand                                    chinese-laundry  chinese-laundry
      40       colour                   {'id': 'white', 'name': 'White'}            White
      41        brand                                    chinese-laundry  chinese-laundry
      44  productType                                            tshirts          tshirts
      50      brandId                                                925              925
      52        price  {'currencyName': 'GBP', 'priceAmount': '20.00'...            20.00
      61        sizes             {'id': 11, 'name': 'M', 'quantity': 1}                M
      62    condition           {'id': 'brand_new', 'name': 'Brand new'}        Brand new
      63       colour                   {'id': 'black', 'name': 'Black'}            Black
      64          age                       {'id': '90s', 'name': '90s'}              90s
      67        brand                                    chinese-laundry  chinese-laundry
      70  productType                                            jackets          jackets
      
      
      

      【讨论】:

      • 为什么我们有df['keys'] == 'price'
      • 您在问题中陈述了选择;I want to flatten out the dictionary for specific values belonging to their key. For example, grab only the value from priceAmount, and the value from name in any other dictionary key.。 priceAmount 似乎只与 keys 列中的 price 相关联
      • 感谢您的解释!我查看了进一步巩固这种方法的文档。
      猜你喜欢
      • 1970-01-01
      • 2021-01-02
      • 2018-11-26
      • 2020-05-23
      • 1970-01-01
      • 2019-02-22
      • 2017-06-16
      • 2019-05-18
      • 2020-08-14
      相关资源
      最近更新 更多