如何比较python中的列表/集合列表？答案

【问题标题】：How to compare a list of lists/sets in python?如何比较python中的列表/集合列表？
【发布时间】：2011-08-31 15:17:51
【问题描述】：

比较两个列表/集合并输出差异的最简单方法是什么？是否有任何内置函数可以帮助我比较嵌套列表/集合？

输入：

First_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222],  
              ['Test3.doc', '3c3c3c', 3333]
             ]  
Secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]

预期输出：

Differences = [['Test3.doc', '3c3c3c', 3333],
               ['Test3.doc', '8p8p8p', 9999], 
               ['Test4.doc', '4d4d4d', 4444]]

【问题讨论】：

在此处查看与集合相关的文档：docs.python.org/3.8/library/…

标签： python list compare set tuples

【解决方案1】：

请注意，使用此方法您将失去订单

first_set=set(map(tuple,S))
second_set=set(map(tuple,T))
print map(list,list(first_set.union(second_set)-(first_set&second_set)))

【讨论】：

【解决方案2】：

老问题，但这是我用来返回两个列表中都没有的唯一元素的解决方案。

我用它来比较从数据库返回的值和目录爬虫包生成的值。我不喜欢我找到的其他解决方案，因为它们中的许多无法动态处理平面列表和嵌套列表。

def differentiate(x, y):
    """
    Retrieve a unique of list of elements that do not exist in both x and y.
    Capable of parsing one-dimensional (flat) and two-dimensional (lists of lists) lists.

    :param x: list #1
    :param y: list #2
    :return: list of unique values
    """
    # Validate both lists, confirm either are empty
    if len(x) == 0 and len(y) > 0:
        return y  # All y values are unique if x is empty
    elif len(y) == 0 and len(x) > 0:
        return x  # All x values are unique if y is empty

    # Get the input type to convert back to before return
    try:
        input_type = type(x[0])
    except IndexError:
        input_type = type(y[0])

    # Dealing with a 2D dataset (list of lists)
    try:
        # Immutable and Unique - Convert list of tuples into set of tuples
        first_set = set(map(tuple, x))
        secnd_set = set(map(tuple, y))

    # Dealing with a 1D dataset (list of items)
    except TypeError:
        # Unique values only
        first_set = set(x)
        secnd_set = set(y)

    # Determine which list is longest
    longest = first_set if len(first_set) > len(secnd_set) else secnd_set
    shortest = secnd_set if len(first_set) > len(secnd_set) else first_set

    # Generate set of non-shared values and return list of values in original type
    return [input_type(i) for i in {i for i in longest if i not in shortest}]

【讨论】：

【解决方案3】：

通过使用集合推导，您可以使其成为单行。如果你愿意：

得到一组元组，然后：

Differences = {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}

或者获取元组列表，然后：

Differences = list({tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list})

或者获取列表列表（如果你真的想要的话），那么：

Differences = [list(j) for j in {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}]

PS：我在这里读到：https://stackoverflow.com/a/10973817/4900095 map() 函数不是一种 Python 的做事方式。

【讨论】：

【解决方案4】：

所以你想要两个项目列表之间的区别。

first_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '3c3c3c', 3333]]
secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]

首先，我将每个列表列表转换为元组列表，因为元组是可散列的（列表不是），因此您可以将元组列表转换为一组元组：

first_tuple_list = [tuple(lst) for lst in first_list]
secnd_tuple_list = [tuple(lst) for lst in secnd_list]

然后你就可以做套路了：

first_set = set(first_tuple_list)
secnd_set = set(secnd_tuple_list)

编辑（由 sdolan 建议）：您可以为单行中的每个列表完成最后两个步骤：

first_set = set(map(tuple, first_list))
secnd_set = set(map(tuple, secnd_list))

注意：map 是一个函数式编程命令，它将第一个参数中的函数（在本例中为 tuple 函数）应用于第二个参数中的每个项目（在我们的例子中是列表的列表）。

并找出集合之间的对称差：

>>> first_set.symmetric_difference(secnd_set) 
set([('Test3.doc', '3c3c3c', 3333),
     ('Test3.doc', '8p8p8p', 9999),
     ('Test4.doc', '4d4d4d', 4444)])

注意first_set ^ secnd_set 等同于symmetric_difference。

此外，如果您不想使用集合（例如，使用 python 2.2），它也很简单。例如，使用列表推导：

>>> [x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list]
[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]

或使用功能性filter 命令和lambda 功能。（您必须测试两种方式并结合起来）。

>>> filter(lambda x: x not in secnd_list, first_list) + filter(lambda x: x not in first_list, secnd_list)

[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]

【讨论】：

+1：但我认为map(tuple, first_list) 更适合元组转换。此外，symmetric_difference 的第一个参数不需要集合，因此您可以跳过 secnd_set 中的集合转换（尽管它可能只是在幕后进行）。
@sdolan：我同意地图更干净。也可以做类似first_set = set(map(tuple, first_list)) 跳过中间元组列表的事情。但我试图进行教学，因为 tang 对 python 来说似乎是新手（例如，不在他的字符串中加上引号），而且我个人认为列表理解比功能更强大的 map 对新手来说更易读。
嗨！如果您在线，您能否告诉我如何比较列表列表（如果无序），我刚刚链接了您的 answer my one here 我正在学习 Python。使用sort() 我可以做到，但这会改变原始列表:( ..

【解决方案5】：

>>> First_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333']] 
>>> Secnd_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333'], ['Test4.doc', '4d4d4d', '4444']] 


>>> z = [tuple(y) for y in First_list]
>>> z
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333')]
>>> x = [tuple(y) for y in Secnd_list]
>>> x
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333'), ('Test4.doc', '4d4d4d', '4444')]


>>> set(x) - set(z)
set([('Test4.doc', '4d4d4d', '4444')])

【讨论】：

+1 注意set1 - set2 对应于差异（set1 中的元素但set2 中没有），我认为他希望对称差异（set1 ^ set2）在set1 或@987654325 中找到元素@，但不是两者兼而有之。因为他没有指定要从哪个集合中减去元素。

【解决方案6】：

不确定是否有一个很好的功能，但“手动”的方式并不难：

differences = []

for list in firstList:
    if list not in secondList:
        differences.append(list)

【讨论】：

请注意，这不会找到secondList 中的列表，但不会找到firstList 中的列表；尽管您总是可以同时检查两种方式，例如：[x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list]。不使用关键字/类型/函数list 作为变量的名称也是一个好习惯。即使退出 for 循环，也无法使用 list 关键字。

【解决方案7】：

我想您必须将列表转换为集合：

>>> a = {('a', 'b'), ('c', 'd'), ('e', 'f')}
>>> b = {('a', 'b'), ('h', 'g')}
>>> a.symmetric_difference(b)
{('e', 'f'), ('h', 'g'), ('c', 'd')}

【讨论】：

【解决方案8】：

http://docs.python.org/library/difflib.html 是您寻找内容的良好起点。

如果您将它递归地应用于增量，您应该能够处理嵌套数据结构。但这需要一些工作。

【讨论】：