【发布时间】:2015-02-27 05:48:21
【问题描述】:
我有以下 Pandas DataFrame 'df1':
id_client product
client1 product1
client1 product4
client1 product5
client2 product1
client2 product6
client3 product1
首先我想按 id_client 分组并检索列表中的匹配产品:
id_client product
client1 [product1,product4,product5]
client2 [product1,product6]
client3 [product1]
然后对于每个列表的每个元素,我想像这样向新的 DataFrame 'df2' 添加一个新行(nb_product 是每个列表的长度):
product nb_product
product1 3
product4 3
product5 3
product1 2
product6 2
product1 1
所以我首先创建了一个新字典:
nb_of_combination = {}
nb_of_combination['product'] = []
nb_of_combination['nb_product'] = []
然后我声明了以下函数:
def nb_of_combination(my_list):
nb_comb = len(my_list)
for row in my_list:
nb_of_combination['product'].append(row)
nb_of_combination['nb_product'].append(nb_comb)
然后我按字段“id_client”按“df1”分组,并应用函数“nb_of_combination”:
df1 = df1.groupby('id_client',as_index=False).apply(lambda x: nb_of_combination(list(x.product)))
但我收到以下错误:
df1 = df1.groupby('id_client',as_index=False).apply(lambda x: nb_of_combination(list(x.product)))
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 660, in apply
return self._python_apply_general(f)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 667, in _python_apply_general
not_indexed_same=mutated)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2821, in _wrap_applied_output
v = next(v for v in values if v is not None)
从那以后我真的不明白:
df2 = pd.DataFrame(nb_of_combination)
似乎运作良好。
【问题讨论】: