数据框中所有可能的列组合 -pandas/python答案

【问题标题】：All possible combinations of columns in dataframe -pandas/python数据框中所有可能的列组合 -pandas/python
【发布时间】：2017-09-06 23:01:37
【问题描述】：

我正在尝试获取一个数据框并创建另一个数据框，其中包含所有可能的列组合以及相应值之间的差异，即在 11 月 11 日列 AB 应该是 (B-A)= 0 等。

例如，以

开头

        Dt              A           B           C          D
        11-apr          1           1           1          1
        10-apr          2           3           1          2

如何获得如下所示的新框架：

我遇到过以下帖子，但无法将其转置为列。

Aggregate all dataframe row pair combinations using pandas

【问题讨论】：

关于如何为 3 列执行此操作的任何想法，假设我想在上面的示例中执行 2*B - A - C？

标签： pandas combinations

【解决方案1】：

你可以使用：

from itertools import combinations
df = df.set_index('Dt')

cc = list(combinations(df.columns,2))
df = pd.concat([df[c[1]].sub(df[c[0]]) for c in cc], axis=1, keys=cc)
df.columns = df.columns.map(''.join)
print (df)
        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1

【讨论】：

感谢您，完美运行。关于如何修改这 3 种组合的任何想法，例如 ABC、ABD、BCD 等，而不是 (B-A) 具有 2* B - C - A。
你觉得cc = list(combinations(df.columns,3)) 吗？
然后df.columns = df.columns.map('-'.join) ?
我的列表工作没问题，但是在 pd.concat([df[c[2]].sub(df[c[1]]) 上我正在努力工作第三个参考。
如何对所有更多变量（更多组合）执行相同操作并添加数字（或字符串）而不是减去它们？例如A B C D E AB AC AD ..... ABCDE ? @jezrael

【解决方案2】：

确保您的索引是Dt

df = df.set_index('Dt')

使用numpys np.tril_indices 和切片 np.triu_indices的解释见下文

v = df.values

i, j = np.tril_indices(len(df.columns), -1)

我们可以为这些列创建一个pd.MultiIndex。这使得它更适用于长度超过一个字符的列名。

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    [df.columns[j], df.columns[i]]
)

        A     B  A  B  C
        B  C  C  D  D  D
Dt                      
11-apr  0  0  0  0  0  0
10-apr  1 -1 -2  0 -1  1

但我们也可以这样做

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    df.columns[j] + df.columns[i]
)

        AB  AC  BC  AD  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1  -2   0  -1   1

np.tril_indices 解释

这是一个numpy 函数，它返回两个数组，当它们一起使用时，提供方阵的下三角形的位置。这在对事物的所有组合进行操作时很方便，因为这个下三角形代表矩阵的一个轴与另一个轴的所有组合。

考虑使用数据框d 进行说明

d = pd.DataFrame(np.array(list('abcdefghijklmnopqrstuvwxy')).reshape(-1, 5))
d

   0  1  2  3  4
0  a  b  c  d  e
1  f  g  h  i  j
2  k  l  m  n  o
3  p  q  r  s  t
4  u  v  w  x  y

三角形索引，当看起来像坐标对时，看起来像这样

i, j = np.tril_indices(5, -1)
list(zip(i, j))

[(1, 0),
 (2, 0),
 (2, 1),
 (3, 0),
 (3, 1),
 (3, 2),
 (4, 0),
 (4, 1),
 (4, 2),
 (4, 3)]

我可以使用i 和j 操作ds 值

d.values[i, j] = 'z'
d

   0  1  2  3  4
0  a  b  c  d  e
1  z  g  h  i  j
2  z  z  m  n  o
3  z  z  z  s  t
4  z  z  z  z  y

你可以看到它只针对下三角

幼稚时间测试

【讨论】：

【解决方案3】：

itertools.combinations 会帮助你：

import itertools
pd.DataFrame({'{}{}'.format(a, b): df[a] - df[b] for a, b in itertools.combinations(df.columns, 2)})

结果：

        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr  -1   1   0   2   1  -1

【讨论】：

如果您有其他条件，例如 df = pd.DataFrame({'{}{}'.format(a, b): df[a] & df[b] for a, b in itertools.combinations(df.columns, 2) if (df[a] & df[b]).any() })。列标签不会像以前的答案那样混乱。

【解决方案4】：

Itertools 模块应该可以帮助您创建所需的组合/排列。

from itertools import combinations

# Creating a new pd.DataFrame
new_df = pd.DataFrame(index=df.index)

# list of columns
columns = df.columns

# Create all combinations of length 2 . eg. AB, BC, etc.
for combination in combinations(columns, 2):
    combination_string = "".join(combination)
    new_df[combination_string] = df[combination[1]]-df[combination[0]]
    print new_df


         AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1

【讨论】：

虽然比朗吉塔从上面的回答慢，但这更具可读性。感谢@Nipun 的出色回答。