【问题标题】:Merging list of dictionaries to remove all duplicates合并字典列表以删除所有重复项
【发布时间】:2021-12-03 10:22:37
【问题描述】:

我正在尝试获取一个简单的 Python 代码来将字典列表合并到一个精简列表中,因为我有很多重复的 atm。

从这里:

[
    {
      "module": "RECEIPT BISCUITS",
      "product_range": "ULKER BISCUITS",
      "receipt_category": "BISCUITS"
    },
    {
      "module": "RECEIPT BISCUITS",
      "product_range": "ULKER",
      "receipt_category": "BISCUITS"
    },
    {
        "module": "RECEIPT BISCUITS",
        "product_range": "ULKER BISCUITS GOLD",
        "receipt_category": "BISCUITS GOLD"
    },
    {
        "module": "RECEIPT COFFEE",
        "product_range": "BLACK GOLD",
        "receipt_category": "BLACK GOLD"
    }
]

到这里:

[
    {
      "module": "RECEIPT BISCUITS",
      "product_range": ["ULKER BISCUITS", "ULKER"],
      "receipt_category": ["BISCUITS", "BISCUITS GOLD"]
    },
    {
        "module": "RECEIPT COFFEE",
        "product_range": ["BLACK GOLD"],
        "receipt_category": ["BLACK GOLD"]
    }
]

模块用于在它们之间进行排序的位置,其他 2 将存储为列表,即使只有一个值。顺便说一句,这是 JSON 格式。

【问题讨论】:

标签: python json list dictionary merge


【解决方案1】:

collections.defaultdict 拯救您的数据重组需求!

import collections

data = [
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS"},
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS"},
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD"},
    {"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD"},
]

grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"

for datum in data:
    datum = datum.copy()  # Copy so we can .pop without consequence
    group = datum.pop(group_key)  # Get the key (`module` value)
    for key, value in datum.items():  # Loop over the rest and put them in the group
        grouped[group][key].append(value)

collated = [
    {
        group_key: group,
        **values,
    }
    for (group, values) in grouped.items()
]

print(collated)

打印出来

[
  {'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD']},
  {'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']}
]

请注意,这不会对 product_range 中的值进行重复数据删除,因为我不确定值的顺序对您是否重要,以及是否使用集合(不保留顺序)。

list 更改为set 并将append 更改为add 将使值唯一。

【讨论】:

    猜你喜欢
    • 2020-03-03
    • 1970-01-01
    • 1970-01-01
    • 2013-10-29
    • 2012-02-16
    • 2016-08-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多