我使用了一个稍微复杂的列表进行测试:有些值重复了两次,有些重复了 3 次,出现在相同和不同的子列表中。
我们在每个子列表中使用Counter,并为每个值保留我们看到的最高计数的字典。最后,我们构建输出列表,只保留每行中重复次数最多的值。
list_of_lists =[[11, 11, 53], # 11 is repeated 2 times,
[2, 137], # it shouldn't appear in the result
[2, 7, 31],
[2, 2, 7, 31],
[3, 3, 3, 4, 4, 4, 5, 5, 5, 29], # 3 times 3, 4 and 5
[2, 2, 2, 3, 137], # and 3 times 2
[2, 2, 7, 31],
[11, 53]]
from collections import Counter, defaultdict
def maxcount(list_of_lists):
out = defaultdict(int)
max_repetitions = 0
for sublist in list_of_lists:
for value, count in Counter(sublist).items():
if count > 1 and count > out[value]:
out[value] = count
if count > max_repetitions:
max_repetitions = count
return([[val, count] for val, count in out.items() if count == max_repetitions])
print(maxcount(list_of_lists))
# [[2, 3], [3, 3], [4, 3], [5, 3]]
我喜欢itertools,所以我很想将@Sunitha 的解决方案与这个解决方案进行比较。
这个解决方案:
*%timeit maxcount(list_of_lists)
# 65 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
@Sunitha 更多使用 itertools 的解决方案:
from itertools import chain, groupby
from collections import Counter
def maxcount_with_itertools(ll):
f = lambda t: t[1]
return list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])
%timeit maxcount_with_itertools(list_of_lists)
# 70.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
这只是有点慢。