【发布时间】:2017-08-05 16:30:39
【问题描述】:
我有两个数据框:
[in] print(testing_df.head(n=5))
print(product_combos1.head(n=5))
[out]
product_id length
transaction_id
001 (P01,) 1
002 (P01, P02) 2
003 (P01, P02, P09) 3
004 (P01, P03) 2
005 (P01, P03, P05) 3
product_id count length
0 (P06, P09) 36340 2
1 (P01, P05, P06, P09) 10085 4
2 (P01, P06) 36337 2
3 (P01, P09) 49897 2
4 (P02, P09) 11573 2
我想返回len(testing_df + 1) 频率最高的product_combos 行,并在其中包含testing_df 字符串。例如,transaction_id 001 我想返回product_combos[3](虽然只有 P09)。
对于第一部分(仅根据长度进行比较),我尝试了:
# Return the product combos values that are of the appropriate length and the strings match
for i in testing_df['length']:
for k in product_combos1['length']:
if (i)+1 == (k):
matches = list(k)
但是,这会返回错误:
TypeError: 'numpy.int64' object is not iterable
【问题讨论】: