【问题标题】:Remove duplicates and combine multiple lists into one?删除重复项并将多个列表合并为一个?
【发布时间】:2018-04-10 05:27:36
【问题描述】:

如何删除重复项并将多个列表合并为一个,如下所示:

function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]) 应该准确地返回

[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]

【问题讨论】:

  • 为什么不用dict 来代替呢?
  • Python group by的可能重复
  • 您自己尝试过吗?有什么代码可以分享吗?
  • 当它是Javascript时为什么要标记它Python?

标签: python duplicates


【解决方案1】:

最简单的方法是使用defaultdict

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i,j in l: 
        d[i].append(j)                   #append value to the key
>>> d
=> defaultdict(<class 'list'>, {'hello': ['me.txt'], 'good': ['me.txt', 'money.txt'], 
                                'rep': ['money.txt']})

    #to get it in a list
>>> out = [ [key,d[key]] for key in d]
>>> out
=> [['hello', ['me.txt']], ['good', ['me.txt', 'money.txt']], ['rep', ['money.txt']]]

#driver 值:

IN : l = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]

【讨论】:

    【解决方案2】:

    试试这个(不需要库):

    your_input_data = [ ["hello","me.txt"], ["good","me.txt"], ["good","me.txt"], ["good","money.txt"], ["rep", "money.txt"] ]
    
    
    my_dict = {}
    for box in your_input_data:
    
        if box[0] in my_dict:
    
            buffer_items = []
            for items in box[1:]:
                if items not in my_dict[box[0]]:
                    buffer_items.append(items)
    
            remove_dup = list(set(buffer_items + my_dict[box[0]]))
            my_dict[box[0]] = remove_dup
    
        else:
    
            buffer_items = []
            for items in box[1:]:
                buffer_items.append(items)
    
            remove_dup = list(set(buffer_items))
    
            my_dict[box[0]] = remove_dup
    
    
    last_point = [[keys, values] for keys, values in my_dict.items()]
    
    print(last_point)
    

    祝你好运...

    【讨论】:

      【解决方案3】:

      您也可以使用传统词典。

      In [30]: l1 = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
      
      In [31]: for i, j in l1:
          ...:     if i not in d2:
          ...:         d2[i] = j
          ...:     else:
          ...:         val = d2[i]
          ...:         d2[i] = [val, j]
          ...:         
      
      In [32]: d2
      Out[32]: {'good': ['me.txt', 'money.txt'], 'hello': 'me.txt', 'rep': 'money.txt'}
      
      In [33]: out = [ [key,d1[key]] for key in d1]
      
      In [34]: out
      Out[34]: 
      [['rep', ['money.txt']],
      ['hello', ['me.txt']],
      ['good', ['me.txt', 'money.txt']]]
      

      【讨论】:

        【解决方案4】:

        让我们先了解一下实际问题:

        示例提示:

        对于这些类型的列表问题,有一个模式:

        所以假设你有一个列表:

        a=[(2006,1),(2007,4),(2008,9),(2006,5)]
        

        并且您想将其转换为 dict 作为元组的第一个元素作为键和元组的第二个元素。类似:

        {2008: [9], 2006: [5], 2007: [4]}
        

        但是有一个问题,您还希望那些具有不同值但键相同的键,例如 (2006,1) 和 (2006,5) 键相同但值不同。您希望这些值仅附加一个键,因此预期输出:

        {2008: [9], 2006: [1, 5], 2007: [4]}
        

        对于这种类型的问题,我们会这样做:

        首先创建一个新的字典然后我们遵循这个模式:

        if item[0] not in new_dict:
            new_dict[item[0]]=[item[1]]
        else:
            new_dict[item[0]].append(item[1])
        

        所以我们首先检查 key 是否在新的 dict 中,如果已经存在则将重复键的值添加到它的值中:

        完整代码:

        a=[(2006,1),(2007,4),(2008,9),(2006,5)]
        
        new_dict={}
        
        for item in a:
            if item[0] not in new_dict:
                new_dict[item[0]]=[item[1]]
            else:
                new_dict[item[0]].append(item[1])
        
        print(new_dict)
        

        您的实际问题解决方案:

        list_1=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
        
        no_dublicates={}
        
        for item in list_1:
            if item[0] not in no_dublicates:
                no_dublicates[item[0]]=["".join(item[1:])]
            else:
                no_dublicates[item[0]].extend(item[1:])
        
        list_result=[]
        for key,value in no_dublicates.items():
            list_result.append([key,value])
        print(list_result)
        

        输出:

        [['hello', ['me.txt']], ['rep', ['money.txt']], ['good', ['me.txt', 'money.txt']]]
        

        【讨论】:

          【解决方案5】:
          yourList=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
          expectedList=[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
          
          def getall(allsec, listKey, uniqlist):
              if listKey not in uniqlist:
                  uniqlist.append(listKey)
                  return [listKey, [x[1] for x in allsec if x[0] == listKey]]
          
          uniqlist=[]
          result=sorted(list(filter(lambda x:x!=None, [getall(yourList,elem[0],uniqlist) for elem in yourList])))
          print(result)
          

          希望对你有帮助

          【讨论】:

            【解决方案6】:

            使用 Python 创建一个函数,可以为您提供所需的确切输出,如下所示:

            from collections import defaultdict
            
            def function(data):    
                entries = defaultdict(list)
            
                for k, v in data:
                    entries[k].append(v)
            
                return sorted([k, v] for k, v in entries.items())
            
            print function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]])  
            

            这会将函数的返回显示为:

            [['good', ['me.txt', 'money.txt']], ['hello', ['me.txt']], ['rep', ['money.txt']]]  
            

            它还确保对键进行排序。字典用于处理重复项的删除(因为键需要是唯一的)。

            defaultdict() 用于简化字典中列表的构建。另一种方法是尝试将新值附加到现有键,如果有 KeyError 异常,则添加新键,如下所示:

            def function(data):    
                entries = {}
            
                for k, v in data:
                    try:
                        entries[k].append(v)
                    except KeyError as e:
                        entries[k] = [v]
            
                return sorted([k, v] for k, v in entries.items())
            

            【讨论】:

              【解决方案7】:

              这可以使用 dict 和 sets 轻松解决。

              def combine_duplicates(given_list):
                  data = {}
                  for element_1, element_2 in given_list:
                      data[element_1] = data.get(element_1, set()).add(element_2)
                  return [[k, list(v)] for k, v in data.items()]
              

              【讨论】:

                【解决方案8】:

                创建一个空数组,从子数组中压入索引 0 并连接以将所有值转换为以空格分隔的字符串。

                var your_input_data = [ ["hello","hi", "jel"], ["good"], ["good2","lo"], ["good3","lt","ahhahah"], ["rep", "nice","gr8", "job"] ];
                
                var myprint = []
                for(var i in your_input_data){
                   myprint.push(your_input_data[i][0]);
                }
                console.log(myprint.join(' '))
                

                【讨论】:

                • 它应该是javascript吗?
                • 用 Python 写的东西怎么样(因为这就是问题的标记方式)?另外这如何删除重复项?
                猜你喜欢
                • 1970-01-01
                • 1970-01-01
                • 2021-09-08
                • 2017-06-05
                • 2015-09-20
                • 1970-01-01
                • 2015-11-29
                • 1970-01-01
                • 2010-11-22
                相关资源
                最近更新 更多