【发布时间】:2016-10-10 17:48:51
【问题描述】:
我正在尝试找出最大的 (First_Word, Group) 对
import pandas as pd
df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
First_Word Group Text
0 apple apple bins where to buy apple bins
1 apple apple trees i see an apple tree
2 orange orange juice i like orange juice
3 apple apple trees apple fell out of the tree
4 pear pear tree partrige in a pear tree
然后我做一个groupby:
grouped = df.groupby(['First_Word', 'Group']).count()
Text
First_Word Group
apple apple bins 1
apple trees 2
orange orange juice 1
pear pear tree 1
我现在想将其过滤为仅具有最大 Text 计数的唯一索引行。下面您会注意到 apple bins 已被删除,因为 apple trees 具有最大值。
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
这个max value of group 的问题很相似,但是当我尝试这样的事情时:
df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])
我收到一个错误:KeyError: ('Text', 'occurred at index Text')。如果我将axis=1 添加到apply 我得到IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
【问题讨论】: