【问题标题】:Convert delimited string and value into hierarchical JSON with Python使用 Python 将分隔的字符串和值转换为分层 JSON
【发布时间】:2018-04-19 14:17:57
【问题描述】:

我有以下格式的数据:

[['Director', 9010],
['Director - Product Manager', 9894],
['Director - Product Manager - Project Manager', 9080],
['Director - Product Manager - Project Manager - Staff', 5090],
['Director - Product Manager - Project Manager - Staff 2', 5087],
['Director - Product Manager - Project Manager 2', 9099],...]

并且想要一个看起来像这样的输出:

{
    'title': 'Director',
    'id': 9010,
    'children': [
        {
            'title': 'Product Manager',
            'id': 9894,
            'children': [
                {
                    'title': 'Project Manager',
                    'id': 9080,
                    'children': [
                        ...
                    ]
                },{
                    'title': 'Project Manager 2',
                    'id': 9099,
                    'children': [
                        ...
                    ]
                }],
                ...
            ]
        },
        ...
    ]
}

我一直在玩弄字典,但很难将 id 与标题匹配。任何建议表示赞赏。

【问题讨论】:

  • 请告诉我们您的尝试。

标签: python json tree hierarchical-data delimited


【解决方案1】:

一种有效的方法是使最外层成为列表而不是字典。当我们遍历标题字符串中的每个标题时,我们会在当前列表中查找具有该标题的字典。如果当前列表中不存在标题,那么我们需要创建一个新的字典。如果确实存在,那么我们将该 dict 的子列表设为新的当前列表并继续寻找下一个标题。

我还编写了一个递归 prune 函数,一旦我们完成了所有数据的输入,它会删除所有空子列表,以防你想这样做。

import json

lst = [
    ['Director', 9010],
    ['Director - Product Manager', 9894],
    ['Director - Product Manager - Project Manager', 9080],
    ['Director - Product Manager - Project Manager - Staff', 5090],
    ['Director - Product Manager - Project Manager - Staff 2', 5087],
    ['Director - Product Manager - Project Manager 2', 9099],
]

# Search for a matching name in the current list.
# If it doesn't exist, create it.
def insert(lst, name, idnum):
    for d in lst:
        if d['title'] == name:
            break
    else:
        d = {'title': name, 'id': idnum, 'children': []}
        lst.append(d)
    return d['children']

# Remove empty child lists
def prune(lst):
    for d in lst:
        if d['children']:
            prune(d['children'])
        else:
            del d['children']

# Insert the data into the master list
master = []
for names, idnum in lst:
    lst = master
    for name in [s.strip() for s in names.split(' - ')]:
        lst = insert(lst, name, idnum)

prune(master)

# Get the top level dict from the master list
data = master[0]
print(json.dumps(data, indent=4))

输出

{
    "title": "Director",
    "id": 9010,
    "children": [
        {
            "title": "Product Manager",
            "id": 9894,
            "children": [
                {
                    "title": "Project Manager",
                    "id": 9080,
                    "children": [
                        {
                            "title": "Staff",
                            "id": 5090
                        },
                        {
                            "title": "Staff 2",
                            "id": 5087
                        }
                    ]
                },
                {
                    "title": "Project Manager 2",
                    "id": 9099
                }
            ]
        }
    ]
}

【讨论】:

    【解决方案2】:

    d 作为您的输入,遍历您的输入列表。

    由于每个子列表中都有两个元素,因此将迭代变量中的位置和id分别保存为pid

    例如,您正在处理列表['Director - Product Manager - Project Manager - Staff', 5090],

    要获得每个职位的标题,您可以将职位以- 分隔,并去掉前导和尾随空格。例如,

    >>> d[3][0]
    'Director - Product Manager - Project Manager - Staff'
    >>> map(str.strip,d[3][0].split('-'))
    ['Director', 'Product Manager', 'Project Manager', 'Staff']
    

    输出字典和Staff 的前一个位置一起被传递给递归搜索方法,它获取查找值的所有匹配项,即Project Manager,并返回一个列表。获取最后一个匹配项。

    >>> recursive_search([data,],'Product Manager')[-1]
    {'children': [{'children': [{'id': 5090, 'title': 'Staff'}, {'id': 5087, 'title': 'Staff 2'}], 'id': 9080, 'title': 'Project Manager'}, {'id': 9099, 'title': 'Project Manager 2'}], 'id': 9894, 'title': 'Product Manager'}
    

    您需要将新的 id 附加到上述结果的 children 键中!

    综合以上几点,

    d=[['Director', 9010],['Director - Product Manager', 9894],['Director - Product Manager - Project Manager', 9080],['Director - Product Manager - Project Manager - Staff', 5090],['Director - Product Manager - Project Manager - Staff 2', 5087],['Director - Product Manager - Project Manager 2', 9099],]
    
    from pprint import pprint    
    def recursive_search(items, key):
            found = []
            for item in items:
                    if isinstance(item, list):
                            found += recursive_search(item, key)
                    elif isinstance(item, dict):
                            if key in item.values():
                                    found.append(item)
                            found += recursive_search(item.values(), key)
            return found
    data={}
    for p,id in d:
            desig = map(str.strip,p.split('-'))
            if len(desig)>1:
                    res = recursive_search([data,],desig[-2])[-1]
                    if res:
                            res['children']=res.get('children',[])
                            res['children'].append({'id':id,'title':desig[-1]})
            else:
                    data = {'id':id,'title':p}
    
    pprint.pprint(data)
    

    输出:

    {'children': [{'children': [{'children': [{'id': 5090, 'title': 'Staff'},
                                              {'id': 5087,
                                               'title': 'Staff 2'}],
                                 'id': 9080,
                                 'title': 'Project Manager'},
                                {'id': 9099, 'title': 'Project Manager 2'}],
                   'id': 9894,
                   'title': 'Product Manager'}],
     'id': 9010,
     'title': 'Director'}
    

    参考:这里使用的 recursive_search 函数是通过 here 提到的 dict 搜索的略微修改版本

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-09-08
      • 1970-01-01
      • 2013-09-12
      • 2019-11-18
      • 1970-01-01
      • 2019-08-23
      • 1970-01-01
      • 2010-10-19
      相关资源
      最近更新 更多