【问题标题】:Large amount of lists concatenation [duplicate]大量列表连接[重复]
【发布时间】:2019-05-25 15:59:23
【问题描述】:

如果两个或多个不同列表中的一个元素相同,我正在尝试创建一个连接多个列表的函数。

例子:

[[1,2],[3,4,5],[0,4]] 将变为 [[1,2],[0,3,4,5]

[[1],[1,2],[0,2]] 将变为 [[0,1,2]]

[[1, 2], [2, 3], [3, 4]] 将变为 [[1,2,3,4]]

事实上,如果它们有一个共同的元素,我们只是重新组合列表,然后我们删除这两个元素之一。决赛名单必须有独特的元素。

我尝试制作以下功能。它可以工作,但是当使用大列表(大约 100 或 200 个列表列表)时,我得到了以下递归错误: RecursionError: maximum recursion depth exceeded while getting the repr of an object

def concat(L):
   break_cond = False
   print(L)
   for L1 in L:
       for L2 in L:
           if (bool(set(L1) & set(L2)) and L1 != L2):
               break_cond = True
   if (break_cond):
       i, j = 0, 0
       while i < len(L):
           while j < len(L):
               if (bool(set(L[i]) & set(L[j])) and i != j):
                   L[i] = sorted(L[i] + list(set(L[j]) - set(L[i])))
                   L.pop(j)
               j += 1
           i += 1
       return concat(L)

此外,我想只使用基本的 python 而不是那么多的库。任何想法 ?谢谢

我得到错误的列表示例:

[[0, 64], [1, 120, 172], [2, 130], [3, 81, 102], [5, 126], [6, 176], [7, 21, 94], [8, 111, 167], [9, 53, 60, 138], [10, 102, 179], [11, 45, 72], [12, 53, 129], [14, 35, 40, 58, 188], [15, 86], [18, 70, 94], [19, 28], [20, 152], [21, 24], [22, 143, 154], [23, 110, 171], [24, 102, 144], [25, 73, 106, 187], [26, 189], [28, 114, 137], [29, 148], [30, 39], [31, 159], [33, 44, 132, 139], [34, 81, 100, 136, 185], [35, 53], [37, 61, 138], [38, 144, 147, 165], [41, 42, 174], [42, 74, 107, 162], [43, 99, 123], [44, 71, 122, 126], [45, 74, 144], [47, 94, 151], [48, 114, 133], [49, 130, 144], [50, 51], [51, 187], [52, 124, 142, 146, 167, 184], [54, 97], [55, 94], [56, 88, 128, 166], [57, 63, 80], [59, 89], [60, 106, 134, 142], [61, 128, 145], [62, 70], [63, 73, 76, 101, 106], [64, 80, 176], [65, 187, 198], [66, 111, 131, 150], [67, 97, 128, 159], [68, 85, 128], [69, 85, 169], [70, 182], [71, 123], [72, 85, 94], [73, 112, 161], [74, 93, 124, 151, 191], [75, 163], [76, 99, 106, 129, 138, 152, 179], [77, 89, 92], [78, 146, 156], [79, 182], [82, 87, 130, 179], [83, 148], [84, 110, 146], [85, 98, 137, 177], [86, 198], [87, 101], [88, 134, 149], [89, 99, 107, 130, 193], [93, 147], [95, 193], [96, 98, 109], [104, 105], [106, 115, 154, 167, 190], [107, 185, 193], [111, 144, 153], [112, 128, 188], [114, 136], [115, 146], [118, 195], [119, 152], [121, 182], [124, 129, 177], [125, 156], [126, 194], [127, 198], [128, 149], [129, 153], [130, 164, 196], [132, 140], [133, 181], [135, 165, 170, 171], [136, 145], [141, 162], [142, 170, 187], [147, 171], [148, 173], [150, 180], [153, 191], [154, 196], [156, 165], [157, 177], [158, 159], [159, 172], [161, 166], [162, 192], [164, 184, 197], [172, 199], [186, 197], [187, 192]]

【问题讨论】:

  • [[1, 2], [2, 3], [3, 4]] 的预期输出是什么?
  • @DanielMesejo 我在我的问题中添加了答案并添加了更多解释,它将是[[1,2,3,4]]
  • 我没有收到您的代码。如果break_cond 为假,你返回什么?为什么需要使用递归而不是while循环?
  • 输入[[0,1],[2,3],[1,2]]呢?输出是 [[0,1],[1,2,3]] 还是 [[0,1,2,3]]?

标签: python


【解决方案1】:

你可以使用networkx库,因为这是graph theoryconnected components的问题:

import networkx as nx

l = [[1,2],[3,4,5],[0,4]]
#l = [[1],[1,2],[0,2]]
#l = [[1, 2], [2, 3], [3, 4]]

G = nx.Graph()

#Add nodes to Graph    
G.add_nodes_from(sum(l, []))

#Create edges from list of nodes
q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in l]

for i in q:

    #Add edges to Graph
    G.add_edges_from(i)

#Find all connnected components in graph and list nodes for each component
[list(i) for i in nx.connected_components(G)]

输出:

[[1, 2], [0, 3, 4, 5]]

如果取消注释第 2 行和注释第 1 行,则输出:

[[0, 1, 2]]

第 3 行也是如此:

[[1, 2, 3, 4]]

【讨论】:

    【解决方案2】:

    您可以使用没有导入的广度优先搜索的递归版本:

    def group_vals(d, current, _groups, _seen, _master_seen):
      if not any(set(current)&set(i) for i in d if i not in _seen):
        yield list({i for b in _groups for i in b})
        for i in d:
           if i not in _master_seen:
             yield from group_vals(d, i, [i], [i], _master_seen+[i])
      else:
        for i in d:
           if i not in _seen and set(current)&set(i):
             yield from group_vals(d, i, _groups+[i], _seen+[i], _master_seen+[i])
    
    def join_data(_data):
      _final_result = list(group_vals(_data, _data[0], [_data[0]], [_data[0]], []))
      return [a for i, a in enumerate(_final_result) if a not in _final_result[:i]]
    
    c = [[[1,2],[3,4,5],[0,4]], [[1],[1,2],[0,2]], [[1, 2], [2, 3], [3, 4]]]
    print(list(map(join_data, c)))
    

    输出:

    [
     [[1, 2], [0, 3, 4, 5]], 
      [[0, 1, 2]], 
     [[1, 2, 3, 4]]
    ]
    

    【讨论】:

    • 我无法让您的代码在 OP 发布的更大测试用例上运行。同样对于在 Python 2.7 中运行此代码的其他读者,您必须转换 yield from 语句,如 here 所示。
    • @ParagS.Chandakkar 递归为较长的输入列表引入了轻微的延迟。您的答案在 OP 发布的最后一个列表中失败 BTW。
    • 是的,我知道,对于更长的测试用例,即使我的答案也失败了。我只是想知道我的方法中的缺陷在哪里。我看了你的方法,它也面临同样的问题。
    【解决方案3】:

    正如@ScottBoston 所说,这是一个图形问题,称为connected components,我建议您使用@ScottBoston 指出的networkx,以防万一您无法使用没有networkx 的版本:

    from itertools import combinations
    
    
    def bfs(graph, start):
        visited, queue = set(), [start]
        while queue:
            vertex = queue.pop(0)
            if vertex not in visited:
                visited.add(vertex)
                queue.extend(graph[vertex] - visited)
        return visited
    
    
    def connected_components(G):
        seen = set()
        for v in G:
            if v not in seen:
                c = set(bfs(G, v))
                yield c
                seen.update(c)
    
    
    def graph(edge_list):
        result = {}
        for source, target in edge_list:
            result.setdefault(source, set()).add(target)
            result.setdefault(target, set()).add(source)
        return result
    
    
    def concat(l):
        edges = []
        s = list(map(set, l))
        for i, j in combinations(range(len(s)), r=2):
            if s[i].intersection(s[j]):
                edges.append((i, j))
        G = graph(edges)
    
        result = []
        unassigned = list(range(len(s)))
        for component in connected_components(G):
            union = set().union(*(s[i] for i in component))
            result.append(sorted(union))
            unassigned = [i for i in unassigned if i not in component]
    
        result.extend(map(sorted, (s[i] for i in unassigned)))
    
        return result
    
    
    print(concat([[1, 2], [3, 4, 5], [0, 4]]))
    print(concat([[1], [1, 2], [0, 2]]))
    print(concat([[1, 2], [2, 3], [3, 4]]))
    

    输出

    [[0, 3, 4, 5], [1, 2]]
    [[0, 1, 2]]
    [[1, 2, 3, 4]]
    

    【讨论】:

    • 非常感谢!我已经知道networkx,实际上我想解决的问题是一个图形问题,这就是我需要一个concat函数的原因。但我需要从头开始,再次感谢!
    【解决方案4】:

    如果你想要简单的形式,这里是解决方案:

    def concate(l):
        len_l = len(l)
        i = 0
        while i < (len_l - 1):
            for j in range(i + 1, len_l):
    
                # i,j iterate over all pairs of l's elements including new 
                # elements from merged pairs. We use len_l because len(l)
                # may change as we iterate
    
                i_set = set(l[i])
                j_set = set(l[j])
    
                if len(i_set.intersection(j_set)) > 0:
                    # Remove these two from list
                    l.pop(j)
                    l.pop(i)
    
                    # Merge them and append to the orig. list
                    ij_union = list(i_set.union(j_set))
                    l.append(ij_union)
    
                    # len(l) has changed
                    len_l -= 1
    
                    # adjust 'i' because elements shifted
                    i -= 1
    
                    # abort inner loop, continue with next l[i]
                    break
    
            i += 1
        return l
    

    【讨论】:

    • 这不是 OP 想要的。
    • @ScottBoston 我看到了他给出的例子,他想要什么。 .仍然混淆示例 1 和示例 3 的不同之处
    • 他正在寻找合并连接部分的列表。
    • @ScottBoston 好的,我明白了,看不出他想要两者之间的共同元素
    • @ScottBoston 检查这个解决方案
    【解决方案5】:

    这是一种迭代方法,应该与纯 python 中的效率一样高。一件事是不得不在最后花费额外的通行证来删除重复项。

    original_list = [[1,2],[3,4,5],[0,4]]
    
    mapping = {}
    rev_mapping = {}
    
    for i, candidate in enumerate(original_list):
        sentinel = -1
        for item in candidate:
            if mapping.get(item, -1) != -1:
                merge_pos = mapping[item]
                #update previous list with all new candidates
                for item in candidate:
                    mapping[item] = merge_pos
                rev_mapping[merge_pos].extend(candidate)
                break
        else:
            for item in candidate:
                mapping[item] = i
            rev_mapping.setdefault(i, []).extend(candidate)
    
    result = [list(set(item)) for item in rev_mapping.values()]
    print(result)
    

    输出:

    [[1, 2], [0, 3, 4, 5]]
    

    【讨论】:

      【解决方案6】:

      如果你想看看算法是如何工作的,你可以使用这个使用连接矩阵的脚本:

      import numpy
      
      def Concatenate(L):
          result = []
          Ls_length = len(L)
          conn_mat = numpy.zeros( [Ls_length, Ls_length] )  # you can use a list of lists instead of a numpy array
          check_vector = numpy.zeros( Ls_length )           # you can use a list instead of a numpy array
      
          idx1 = 0
          while idx1 < Ls_length:
              idx2 = idx1 + 1
              conn_mat[idx1,idx1] = 1   # the diaginal is always 1 since every set intersects itself.
              while idx2 < Ls_length:
                  if bool(set(L[idx1]) & set(L[idx2]) ): # 1 if the sets idx1 idx2 intersect, and 0 if they don't.
                      conn_mat[idx1,idx2] = 1            # this is clearly a symetric matrix. 
                      conn_mat[idx2,idx1] = 1
                  idx2 += 1
              idx1 += 1
          print (conn_mat)
      
          idx = 0
          while idx < Ls_length:
              if check_vector[idx] == 1:     # check if we already concatenate the idx element of L.
                  idx += 1
                  continue
      
              connected = GetAllPositiveIntersections(idx, conn_mat, Ls_length)
              r = set()
              for idx_ in connected:
                  r = r.union(set(L[idx_]))
                  check_vector[idx_] = 1
      
              result.append(list(r))
      
          return result
      
      def GetAllPositiveIntersections(idx, conn_mat, Ls_length):
          # the elements that intersect idx are coded with 1s in the ids' row (or column, since it's a symetric matrix) of conn_mat. 
          connected = [idx]
          i = 0
          idx_ = idx
          while i < len(connected):
              j = 0
              while j < Ls_length:
                  if bool(conn_mat[idx_][j]):
                      if j not in connected: connected.append(j)   
                  j += 1
              i += 1
              if i < len(connected): idx_ = connected[i]
      
          return list(set(connected))
      

      那么你就:

      L = [[1,2],[3,4,5],[0,4]]
      r = Concatenate(L)
      print(r)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-08-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-06-21
        • 2013-06-13
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多