【问题标题】:Slice a list of tuple into list of lists if first and another elements are the same Python如果第一个元素和另一个元素是相同的 Python,则将元组列表切片到列表列表中
【发布时间】:2021-02-21 09:30:54
【问题描述】:

我有以下列表A,包括元组,我想将A 分割成列表列表,如B 所示。逻辑是如果元组的第一个和第四个元素重复,则将该组打包为列表A内的列表。

A = [(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2),
     (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3),
     (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4),
     (3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4),
     (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5),
     (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6),
     (5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9),
     (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10),
     (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11),
     (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12),
     (5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15),
     (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]

输出:

B = [[(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2),
      (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3),
      (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4)],
     [(3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4),
      (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5),
      (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6)],
     [(5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9),
      (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10),
      (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11),
      (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12)],
     [(5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15),
      (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]]

这是我的尝试,它在经过大量胡言乱语之后给出了所需的输出。我正在寻找一种更有效的 Pythonic 方式来实现这一目标。

inter = list(set([(i[0],i[3]) for i in A]))
B = {o_t: [] for o_t in inter}
for i in range(1, len(A)):
    if (A[i][0] == A[i-1][0]
        and A[i][3] == A[i-1][3]):
        B[A[i][0],A[i][3]].append(A[i])
        B[A[i][0],A[i][3]].append(A[i-1])
B = {key: sorted(list(set(B[key])), key = lambda x: x[-1]) for key in B.keys()}
list(B.values())

【问题讨论】:

    标签: python list slice


    【解决方案1】:

    groupby 的完美任务来自itertools

    from itertools import groupby
    A = [(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2),
         (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3),
         (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4),
         (3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4),
         (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5),
         (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6),
         (5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9),
         (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10),
         (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11),
         (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12),
         (5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15),
         (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]
    B = [list(g) for _,g in groupby(A, key=lambda x: (x[0], x[3]))]
    
    print(B)
    

    输出

    [[(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2),
      (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3),
      (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4)],
     [(3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4),
      (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5),
      (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6)],
     [(5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9),
      (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10),
      (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11),
      (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12)],
     [(5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15),
      (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]]
    ​
    

    注意:我假设 A 按第一个和第四个元素排序。 groupby 将列表[1,1,1,2,2,1,3,3] 分组到[(1,1,1), (2,2), (1), (3,3)]。它不会对所有 1's 进行分组

    【讨论】:

    • 使用我的原始数据集,我的代码需要 7.816915 秒,你的 0.672617 秒,@balderman 的 1.996961 秒。由于 Pythonicity 是衡量标准,我接受了你的。
    【解决方案2】:

    下面

    from collections import defaultdict
    
    data = defaultdict(list)
    A = [(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2),
         (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3),
         (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4),
         (3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4),
         (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5),
         (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6),
         (5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9),
         (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10),
         (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11),
         (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12),
         (5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15),
         (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]
    
    for a in A:
        data[(a[0], a[3])].append(a)
    B = [v for v in data.values()]
    for b in B:
        print(b)
    

    输出

    [(1, 'C-30219', 'C-30060', 'C-6235d935d39c258876476e35a7acfd69-1-1', 2), (1, 'C-30060', 'C-30022', 'C-6235d935d39c258876476e35a7acfd69-1-1', 3), (1, 'C-30022', 'C-30205', 'C-6235d935d39c258876476e35a7acfd69-1-1', 4)]
    [(3, 'C-30248', 'C-30260', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 4), (3, 'C-30260', 'C-30108', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 5), (3, 'C-30108', 'C-30240', 'C-ac19d0edcf4d4ebe071e8d43be1901e2-1-1', 6)]
    [(5, 'C-30269', 'C-30285', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 9), (5, 'C-30285', 'C-30109', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 10), (5, 'C-30109', 'C-30211', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 11), (5, 'C-30211', 'C-30289', 'C-d0d36bb9f2a7e248638cff9a04065977-1-1', 12)]
    [(5, 'C-30072', 'C-30375', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 15), (5, 'C-30375', 'C-30095', 'C-710c460e8dfc2b3a523e077b6c6bdb40-1-1', 16)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-11-23
      • 1970-01-01
      • 1970-01-01
      • 2013-10-29
      • 1970-01-01
      • 1970-01-01
      • 2023-02-01
      • 1970-01-01
      相关资源
      最近更新 更多