在 Pandas 中为每组制作所有可能的组合答案

【问题标题】：Making all possible combinations per group in Pandas在 Pandas 中为每组制作所有可能的组合
【发布时间】：2015-12-13 23:04:07
【问题描述】：

我有一个非常大的 CSV 文件，有 122290 行。顺序如下：

Feature, Person
Fever, Pat1
Headache, Pat1
Burping, Pat1
Fever, Pat2
Obese, Pat2
Headache, Pat2
Jaundice, Pat2

我想制作一张新桌子。该表是每位患者特征的组合...我想查看某些症状是否显示出现聚集。我用 Python 和 csv.reader 做到了这一点。但是因为它一直在循环，所以 122290 行需要几个小时。每个病人大约有。 305 个症状... 有 405 名患者。我不想要像 Feature1 == Feature2 这样的重复...我想知道这在 Pandas 中是否也可能...如果是这样，你能说明你将如何开始解决这个问题吗？谢谢！

Feature1, Feature2, Person
Fever, Headache, Pat1
Fever, Burping, Pat1
Heache, Burping, Pat1
Fever, Obese, Pat2
Fever, Headache, Pat2
Fever, Jaundice, Pat2
Obese, Headache, Pat2
Obese, Jaundice, Pat2
Headache, Jaundice, Pat2

【问题讨论】：

你想要一个 405 x 305 的矩阵，每个单元格中都有 1 或 0 吗？
我终于有了一个3列数百万行的矩阵。我不想使用矩阵......这会导致矩阵中有很多0和大量内存使用，因为我想对这些组合进行统计......

标签： python csv pandas

【解决方案1】：

使用merge。您可以将 DataFrame 与其自身进行自合并，然后删除额外的对（其中特征被反转或与自身配对）。

df2 = pandas.merge(df, df, on='Person', suffixes=['1', '2'])
df2 = df2[df2.Feature1 < df2.Feature2]

结果：

Person  Feature1  Feature2
Pat1       Fever  Headache
Pat1     Burping     Fever
Pat1     Burping  Headache
Pat2       Fever     Obese
Pat2       Fever  Headache
Pat2       Fever  Jaundice
Pat2    Headache     Obese
Pat2    Headache  Jaundice
Pat2    Jaundice     Obese

【讨论】：