【问题标题】:How do I remove repetitions from multiple JSON files?如何从多个 JSON 文件中删除重复项?
【发布时间】:2017-10-17 10:47:48
【问题描述】:

我有多个包含大写字母和国家/地区的 JSON 文件。如何从所有文件中删除重复的键值对?

我有以下 JSON 文件之一

{
    "data": [
    {
        "Capital": "Berlin",
        "Country": "Germany"
    },
    {
        "Capital": "New Delhi",
        "Country": "India"
    },
    {
        "Capital": "Canberra",
        "Country": "Australia"
    },
    {
        "Capital": "Beijing.",
        "Country": "China"
    },
    {
        "Capital": "Tokyo",
        "Country": "Japan"
    },
    {
        "Capital": "Tokyo",
        "Country": "Japan"
    },
    {
        "Capital": "Berlin",
        "Country": "Germany"
    },
    {
        "Capital": "Moscow",
        "Country": "Russia"
    },
    {
        "Capital": "New Delhi",
        "Country": "India"
    },
    {
        "Capital": "Ottawa",
        "Country": "Canada"
    }
    ]

}

有很多这样的 JSON 文件包含重复的项目。如何删除重复的项目,只保留第一次出现?我试过这个,但不起作用

dupes = []
for f in json_files:
    with open(f) as json_data:
        nations = json.load(json_data)['data']
        #takes care of duplicates and stores it in dupes
        dupes.append(x for x in nations if x['Capital'] in seen or seen.add(x['Capital']))
        nations = [x for x in nations if x not in dupes] #want to keep the first occurance of the item present in dupes

    with open(f, 'w') as json_data:
        json.dump({'data': nations}, json_data)

【问题讨论】:

    标签: json python-3.x


    【解决方案1】:

    您可能无法使用很酷的列表推导,但常规循环应该可以工作

    used_nations = {} 
    for nation in nations:
        if nation['Capital'] in used_nations:
            nations.remove(nation)
        else:
            used_nations.add(nation['Capital']) 
    

    【讨论】:

    • @nutmeg64 我相信不久就会有人发python.js ;)
    【解决方案2】:

    列表推导很棒!但是...当涉及到if 语句时,它们会使代码复杂化。

    这绝不是一个经验法则。相反,我鼓励你经常使用列表推导。在这种特殊情况下,更分散的解决方案更具可读性。

    我的建议是这样的:

    import json
    
    seen = []
    result = []
    
    with open('data.json') as json_data:
        nations = json.load(json_data)['data']
        #takes care of duplicates and stores it in dupes
        for item in nations:
            if item['Capital'] not in seen:
                seen.append(item['Capital'])
                result.append(item)
    
    with open('data.no_dup.json', 'w') as json_data:
        json.dump({'data': result}, json_data)
    

    在 Python 3.5.2 上测试并运行。

    请注意,为方便起见,我删除了您的外循环。

    【讨论】:

    • 您的代码非常适合我希望实现的目标。谢谢!
    【解决方案3】:

    以下是如何为给定 json 实现此目的的示例代码

    import json
    
    files = ['countries.json']
    
    for f in files:
        with open(f,'r') as fp:
            nations = json.load(fp)
        result = [dict(tupleized) for tupleized in set(tuple(item.items())\
                for item in nations['data'])]
    print result
    print len(result)
    

    输出:

    [{u'Country': u'Russia', u'Capital': u'Moscow'}, {u'Country': u'Japan', u'Capital': u'Tokyo'}, {u'Country': u'Canada', u'Capital': u'Ottawa'}, {u'Country': u'India', u'Capital': u'New Delhi'}, {u'Country': u'Germany', u'Capital': u'Berlin'}, {u'Country': u'Australia', u'Capital': u'Canberra'}, {u'Country': u'China', u'Capital': u'Beijing.'}]
    7
    

    【讨论】:

    • 注意这只会过滤掉重复的对,所以{'Country': 'Russia', 'Capital': 'Moscow'}{'Country': 'Zaire', 'Capital': 'Moscow'}都在result
    猜你喜欢
    • 1970-01-01
    • 2021-08-28
    • 1970-01-01
    • 1970-01-01
    • 2018-12-04
    • 1970-01-01
    • 1970-01-01
    • 2018-12-13
    • 1970-01-01
    相关资源
    最近更新 更多