在嵌套列表中查找最常见的整数答案

【问题标题】：Find most common occurrence of integer within nested lists在嵌套列表中查找最常见的整数
【发布时间】：2018-08-12 08:45:04
【问题描述】：

我想找到嵌套列表中最常出现的整数，并返回该整数及其出现次数（以及多个整数及其出现次数，其中多个整数出现相同次数的情况）。目前数据格式如下：

list_of_lists = 
    [[11, 53],
     [2, 137],
     [2, 7, 31],
     [2, 2, 7, 31],
     [3, 3, 3, 29],
     [2, 2, 2, 3, 137],
     [2, 2, 7, 31],
     [11, 53]]

因此，所需的输出将是[[3, 3], [2, 3]]，数字 3 在第五个嵌套列表中出现了 3 次，而数字 2 在第六个嵌套列表中出现了 3 次。

列表和列表中的列表都不是固定长度的。因此，非常感谢能够解决这个可变长度问题的程序！

我无法直接找到类似的问题。

谢谢！

【问题讨论】：

改进的格式和标点符号

标签： python list count

【解决方案1】：

您可以使用collections.Counter 计算每个列表中元素的出现次数，然后根据出现次数以相反的顺序对结果列表进行排序，然后对结果进行分组（使用itertools.groupby）以获得具有相同最大值的所有结果

>>> from itertools import chain, groupby
>>> from collections import Counter
>>> 
>>> ll = [[11, 53], [2, 137], [2, 7, 31], [2, 2, 7, 31], [3, 3, 3, 29], [2, 2, 2, 3, 137], [2, 2, 7, 31], [11, 53]]
>>>
>>> f = lambda t: t[1]
>>> list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])
[(3, 3), (2, 3)]

【讨论】：

【解决方案2】：

我使用了一个稍微复杂的列表进行测试：有些值重复了两次，有些重复了 3 次，出现在相同和不同的子列表中。

我们在每个子列表中使用Counter，并为每个值保留我们看到的最高计数的字典。最后，我们构建输出列表，只保留每行中重复次数最多的值。

list_of_lists =[[11, 11, 53], # 11 is repeated 2 times, 
 [2, 137],                    # it shouldn't appear in the result
 [2, 7, 31],
 [2, 2, 7, 31],
 [3, 3, 3, 4, 4, 4, 5, 5, 5, 29],     # 3 times 3, 4 and 5
 [2, 2, 2, 3, 137],                   # and 3 times 2
 [2, 2, 7, 31],
 [11, 53]]

from collections import Counter, defaultdict

def maxcount(list_of_lists):
    out = defaultdict(int)
    max_repetitions = 0
    for sublist in list_of_lists:
        for value, count in Counter(sublist).items():
            if count > 1 and count > out[value]:
                out[value] = count
                if count > max_repetitions:
                    max_repetitions = count


    return([[val, count] for val, count in out.items() if count == max_repetitions])

print(maxcount(list_of_lists))
# [[2, 3], [3, 3], [4, 3], [5, 3]]

我喜欢itertools，所以我很想将@Sunitha 的解决方案与这个解决方案进行比较。

这个解决方案：

*%timeit maxcount(list_of_lists)
# 65 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@Sunitha 更多使用 itertools 的解决方案：

from itertools import chain, groupby
from collections import Counter

def maxcount_with_itertools(ll):
    f = lambda t: t[1]
    return list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])

%timeit maxcount_with_itertools(list_of_lists)
# 70.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

这只是有点慢。

【讨论】：

【解决方案3】：

如果你有兴趣使用纯Python来做，那么有以下方法：

list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]

maxOccurences = [max([[elem,sublist.count(elem),index] for elem in sublist], key=lambda i:sublist.count(i)) for index, sublist in enumerate(list_of_lists)]
maximum = max(maxOccurences, key=lambda i: i[1])
elements = [elem[:2] for elem in maxOccurences if elem[1]==maximum[1]]
print(elements)

输出：

[[3, 3], [2, 3]]

另一个建议如下：

list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]

maximum = max([max([[elem,sublist.count(elem)] for elem in sublist], key=lambda i:sublist.count(i)) for sublist in list_of_lists], key=lambda i: i[1])
elements = [[elem,sublist.count(elem)] for sublist in list_of_lists for elem in set(sublist) if sublist.count(elem)==maximum[1]]
print(elements)

输出：

[[3, 3], [2, 3]]

【讨论】：

【解决方案4】：

您可以使用collections.Counter，分为3个步骤：

通过map 将您的列表转换为Counter 对象。
通过max 计算最常见值的计数。
使用列表推导过滤从您的子列表派生的 Counter 对象。

这是一个演示。

from collections import Counter

counters = list(map(Counter, list_of_lists))
most_common_count = max(i.most_common(1)[0][1] for i in counters)

res = [(k, v) for i in counters for k, v in i.items() if v == most_common_count]

print(res)

[(3, 3), (2, 3)]

【讨论】：