【问题标题】:Flatten a list of dictionary that contains a list of dictionaries in python展平包含python中字典列表的字典列表
【发布时间】:2020-01-27 09:28:51
【问题描述】:

如何组织这些信息以拼合成没有重复的字典? (类型和名称无关紧要, 它们可以重复,但元组 (company, year_semester, area, model, category, measure) 不能重复。可能是模型、类别或度量不在输入中。

输入

data = [
    {
        "type": "Y",
        "id": "22",
        "name": "IX",
        "attri": {
            "company": "Peugeot",
            "year_semester": "2020_1",
            "area": "prod",
            "items": [
                {"model": "XYZ", "category": "CC", "measure": "2"},
                {"model": "XYZ", "category": "CC", "measure": "3"},
            ],
        },
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "attri": {
            "company": "Mazda",
            "year_semester": "2019_1",
            "area": "prod",
            "items": [{"model": "XYZ", "category": "CC", "measure": "2"}],
        },
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "attri": {
            "company": "Mazda",
            "year_semester": "2019_1",
            "area": "prod",
            "items": [
                {  # This should't appear because is repeated
                    "model": "XYZ",  # This should't appear because is repeated
                    "category": "CC",  # This should't appear because is repeated
                    "measure": "2",  # This should't appear because is repeated
                },
                {"model": "VVVV", "category": "CC", "measure": "4"},
            ],
        },
    },
    {
        "type": "Y",
        "id": "44",
        "name": "IY",
        "attri": {"company": "Honda", "year_semester": "2018_1", "area": "prod", "items": []},
    },
]

输出:

expected = [
    {
        "type": "Y",
        "id": "22",
        "name": "IX",
        "company": "Peugeot",
        "year_semester": "2020_1",
        "area": "prod",
        "model": "XYZ",
        "category": "CC",
        "measure": "2",
    },
    {
        "type": "Y",
        "id": "22",
        "name": "IX",
        "company": "Peugeot",
        "year_semester": "2020_1",
        "model": "XYZ",
        "category": "CC",
        "measure": "3",
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "company": "Mazda",
        "year_semester": "2019_1",
        "area": "prod",
        "model": "XYZ",
        "category": "CC",
        "measure": "2",
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "company": "Mazda",
        "year_semester": "2019_1",
        "area": "prod",
        "model": "VVVV",
        "category": "CC",
        "measure": "4",
    },
    {
        "type": "Y",
        "id": "44",
        "name": "IY",
        "company": "Honda",
        "year_semester": "2018_1",
        "area": "prod",
    },
]

我在另一篇文章中试过这个:

def flatten(d, sep="_"):
  import collections

  obj = collections.OrderedDict()

  def recurse(t, parent_key=""):

    if isinstance(t, list):
        for i in range(len(t)):
            recurse(t[i], parent_key + sep + str(i) if parent_key else str(i))
    elif isinstance(t, dict):
        for k, v in t.items():
            recurse(v, parent_key + sep + k if parent_key else k)
    else:
        obj[parent_key] = t

recurse(d)

return obj

但我不确定如何避免重复。而且我无法组织我想要的输出。你有什么想法吗?

【问题讨论】:

  • 这本词典语法不正确
  • 输入和输出格式错误。请修复它们。

标签: python list dictionary flat


【解决方案1】:

嗯,你的样本格式不正确。

我假设您的数据是:

data = [
    {
        "type": "Y",
        "id": "22",
        "name": "IX",
        "attri": {
            "company": "Peugeot",
            "year_semester": "2020_1",
            "area": "prod",
            "items": [
                {"model": "XYZ", "category": "CC", "measure": "2"},
                {"model": "XYZ", "category": "CC", "measure": "3"},
            ],
        },
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "attri": {
            "company": "Mazda",
            "year_semester": "2019_1",
            "area": "prod",
            "items": [{"model": "XYZ", "category": "CC", "measure": "2"}],
        },
    },
    {
        "type": "Y",
        "id": "11",
        "name": "IY",
        "attri": {
            "company": "Mazda",
            "year_semester": "2019_1",
            "area": "prod",
            "items": [
                {  # This should't appear because is repeated
                    "model": "XYZ",  # This should't appear because is repeated
                    "category": "CC",  # This should't appear because is repeated
                    "measure": "2",  # This should't appear because is repeated
                },
                {"model": "VVVV", "category": "CC", "measure": "4"},
            ],
        },
    },
    {
        "type": "Y",
        "id": "44",
        "name": "IY",
        "attri": {"company": "Honda", "year_semester": "2018_1", "area": "prod", "items": []},
    },
]

您可以使用函数来“扁平化”您的数据,例如:

def flatten(record):
    attri = record["attri"]
    items = attri["items"]
    flat = {
        "type": record["type"],
        "id": record["id"],
        "name": record["name"],
        "company": attri["company"],
        "year_semester": attri["year_semester"],
        "area": attri["area"],
    }
    for item in items:
        flat.update(
            {
                "model": item["model"],
                "category": item["category"],
                "measure": item["measure"],
            }
        )
    return flat

要扁平化您的数据,您可以这样做:

result = list(map(flatten, data))

结果如下:

[{'area': 'prod',
  'category': 'CC',
  'company': 'Peugeot',
  'id': '22',
  'measure': '3',
  'model': 'XYZ',
  'name': 'IX',
  'type': 'Y',
  'year_semester': '2020_1'},
 {'area': 'prod',
  'category': 'CC',
  'company': 'Mazda',
  'id': '11',
  'measure': '2',
  'model': 'XYZ',
  'name': 'IY',
  'type': 'Y',
  'year_semester': '2019_1'},
 {'area': 'prod',
  'category': 'CC',
  'company': 'Mazda',
  'id': '11',
  'measure': '4',
  'model': 'VVVV',
  'name': 'IY',
  'type': 'Y',
  'year_semester': '2019_1'},
 {'area': 'prod',
  'company': 'Honda',
  'id': '44',
  'name': 'IY',
  'type': 'Y',
  'year_semester': '2018_1'}]

当然这并不完全符合你想要的,但你可以适应它……

【讨论】:

    猜你喜欢
    • 2019-01-11
    • 2018-07-19
    • 2019-11-04
    • 2021-09-07
    • 2018-09-13
    • 1970-01-01
    • 1970-01-01
    • 2021-01-05
    • 2016-11-23
    相关资源
    最近更新 更多