Python：Pandas：获取不同列的所有组合及其频率答案

【问题标题】：Python: Pandas: get all the combination and their frequency of different columnPython：Pandas：获取不同列的所有组合及其频率
【发布时间】：2017-09-20 22:12:57
【问题描述】：

我有一个 python 数据框有三列。

    a   b   c
0   1   2   3
1   1   2   3
2   1   2   8
3   1   5   9
4   1   3   7
5   1   3   4

我想找到a,b,c的所有组合，我的预期结果是这样的：

[1,2,3]: 2  # from row 0 and row 1
[1,2]  : 3  # from row 0 and row 1 and row 2
[1,3]  : 4  # from row 0, 1, 4, 5
[1,4]  : 1
[1,5]  : 1
[1,7]  : 1
[1,8]  : 1
[1,9]  : 1
[2,3]  : 2
............

随意使用任何包。

import pandas as pd
pd.DataFrame(data=[[1,2,3],[1,2,3],[1,2,8],[1,5,9],[1,3,7],[1,3,4]],columns=['a','b','c'])

【问题讨论】：

标签： python pandas

【解决方案1】：

令人作呕的单行：

In [114]: collections.Counter(map(str, itertools.chain.from_iterable(list(df.apply(lambda x: list(itertools.chain.from_iterable([list(itertools.combinations(x, k)) for k in range(1, 4)])), axis=1).values))))
Out[114]: 
Counter({'(1, 2)': 3,
         '(1, 2, 3)': 2,
         '(1, 2, 8)': 1,
         '(1, 3)': 4,
         '(1, 3, 4)': 1,
         '(1, 3, 7)': 1,
         '(1, 4)': 1,
         '(1, 5)': 1,
         '(1, 5, 9)': 1,
         '(1, 7)': 1,
         '(1, 8)': 1,
         '(1, 9)': 1,
         '(1,)': 6,
         '(2, 3)': 2,
         '(2, 8)': 1,
         '(2,)': 3,
         '(3, 4)': 1,
         '(3, 7)': 1,
         '(3,)': 4,
         '(4,)': 1,
         '(5, 9)': 1,
         '(5,)': 1,
         '(7,)': 1,
         '(8,)': 1,
         '(9,)': 1})

一些解释：

首先在每一行上应用lambda 函数，这要归功于df.apply(..., axis=1。
lambda 函数会创建行值的所有可能组合，无论条目数如何。
我们将所有找到的值合并到一个列表中，每行。这就是第一个itertools.chain.from_iterable 发挥作用的地方。
我们将所有行值合并到一个列表中，使用第二个itertools.chain.from_iterable。
感谢collections.Counter，我们描述了结果，并获得了频率。

编辑

相同的解决方案，但不使用itertools.chain.from_iterable

In [25]: collections.Counter([str(k) for l in df.apply(lambda x: [c for i in range(1, 4) for c in itertools.combinations(x, i)], axis=1).values for k in l])

这一次，我利用列表理解来实现相同的结果，这可能会导致更易读的解决方案。步骤大致相同，没有“列表合并”大惊小怪。

【讨论】：

【解决方案2】：

from cytoolz import concat, mapcat
from functools import partial
from itertools import combinations

c = lambda x, k: combinations(x, k)

pd.value_counts(list(concat(concat(map(
    partial(c, x),
    range(2, df.shape[1] + 1)
)) for x in df.values.tolist())))

(1, 3)       4
(1, 2)       3
(1, 2, 3)    2
(2, 3)       2
(5, 9)       1
(1, 2, 8)    1
(1, 3, 4)    1
(2, 8)       1
(1, 4)       1
(1, 3, 7)    1
(1, 5, 9)    1
(1, 8)       1
(1, 9)       1
(1, 7)       1
(3, 7)       1
(3, 4)       1
(1, 5)       1
dtype: int64

随着@juanpa.arrivillaga 对mapcat的建议

pd.value_counts(list(concat(
    (mapcat(partial(c, x), range(2, df.shape[1] + 1)) for x in df.values.tolist())
)))

(1, 3)       4
(1, 2)       3
(1, 2, 3)    2
(2, 3)       2
(5, 9)       1
(1, 2, 8)    1
(1, 3, 4)    1
(2, 8)       1
(1, 4)       1
(1, 3, 7)    1
(1, 5, 9)    1
(1, 8)       1
(1, 9)       1
(1, 7)       1
(3, 7)       1
(3, 4)       1
(1, 5)       1
dtype: int64

【讨论】：

我永远也想不出这种很棒的答案......所以我在我的 python 笔记中记下了它......
很好，我从没听说过toolz，但它听起来超级有用……甚至是 Cythonized 版本。只是一个想法，也许你可以使用mapcat ？我梦想在 Python 中实现高效的平面地图函数！
@juanpa.arrivillaga 好的，我去看看。
@piRSquared 我相信它可以用mapcat(partial(c, x), ...)替换concat(map(partial(c, x), ...))
太棒了！

【解决方案3】：

可能有有效的方法，一种方法可能如下：

import pandas as pd
from  itertools import combinations

from collections import Counter

df = pd.DataFrame(data=[[1,2,3],[1,2,3],[1,2,8],[1,5,9],[1,3,7],[1,3,4]],columns=['a','b','c'])


# Get columns combination
# https://stackoverflow.com/a/43348187/5916727
cc = list(combinations(df.columns, 2))

# Append to new list for combinations
tmp_list = []

for columns in cc:
    tmp_list.append(list(zip(df[columns[0]], df[columns[1]])))

# https://stackoverflow.com/a/32786226/5916727
tmp_list.append(list(zip(df.a, df.b, df.c)))

# Flatten the list
# https://stackoverflow.com/a/952952/5916727
flat_list = [item for sublist in tmp_list for item in sublist]

print(['{0}:{1}'.format(list(item), count) for item, count in Counter(flat_list).items()])

结果：

['[1, 2]:3',
 '[5, 9]:1',
 '[1, 2, 8]:1',
 '[1, 3]:4',
 '[2, 8]:1',
 '[1, 3, 4]:1',
 '[1, 3, 7]:1',
 '[1, 4]:1',
 '[1, 2, 3]:2',
 '[1, 5]:1',
 '[1, 8]:1',
 '[2, 3]:2',
 '[1, 9]:1',
 '[1, 7]:1',
 '[3, 7]:1',
 '[3, 4]:1',
 '[1, 5, 9]:1']

【讨论】：