在 pandas 数据框列中查找值的组合答案

【问题标题】：find combinations of values in pandas dataframe column在 pandas 数据框列中查找值的组合
【发布时间】：2017-04-04 19:34:04
【问题描述】：

我在 pandas df 中有一张桌子

 id_x             id_y
  a                 b
  b                 c
  c                 d
  d                 a
  b                 a
and so on around (1000 rows)

我想找到每个 id_x 和 id_y 的总组合。类似链接的东西

即。 a 与a-b,b-c,c-d 有组合同样 b 有combinations(b-c,c-d,d-a) and also a-b to be considered as a combination for b( a-b = b-a)

并创建一个具有

的数据框df2

id   combinations  count
a          b,c,d     3
b          c,d,a     3
c          d,a,b     3
d          a,b,c     3
and so on ..(distinct product_id_'s)

如果我可以将每个组合放在数据框中的不同列中

id   c1  c2   c3...&so on   count
a     b   c   d               3              
b     c   d   a               3

我应该遵循什么方法？我在 python 上的技能处于初级水平。提前致谢。

【问题讨论】：

您需要更明确地说明您想要做什么。另外，尝试编写一些代码来做到这一点。
它更复杂——我认为你可以从输入中添加所有输出组合——有点不清楚到底需要什么。谢谢。
@jezrael 简而言之是一个链接规则，if a->b and b->c and c->d 因此链应该有a-> b,c,d

标签： python pandas

【解决方案1】：

你可以试试这样的：

#generate dataframe    
pdf = pd.DataFrame(dict(id_x = ['a','b','c','d','b'], id_y = ['b', 'c', 'd', 'a', 'a']))

#generate second dataframe with swapped columns:
pdf_swapped = pdf.rename(columns = dict(id_x= 'id_y', id_y= 'id_x'))

#append both dataframes to each other
pdf_doubled = pd.concat([pdf, dummy_pdf])

#evaluate the frequency of each combination:
result = pdf_doubled.groupby('id_x').apply(lambda x: x.id_y.value_counts())

这给出了以下结果：

a     b    2
      d    1
b     a    2
      c    1
c     b    1
      d    1
d     c    1
      a    1

要弄清楚 a-b 组合的频率，您可以这样做：

result['a', 'b']

【讨论】：

@ for column a 组合是 b and d 但我想要 b,c and d，因为 a->b and b->c and c->d thus chains for a should have a-> b,c,d
我明白了。应该如何处理 a->b、b->c、c->d、d->a 之类的循环？