【问题标题】:Pandas GroupBy Two Text Columns And Return The Max Rows Based On CountsPandas GroupBy 两个文本列并根据计数返回最大行数
【发布时间】:2016-10-10 17:48:51
【问题描述】:

我正在尝试找出最大的 (First_Word, Group)

import pandas as pd

df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
           'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
           'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
                'apple fell out of the tree', 'partrige in a pear tree']},
          columns=['First_Word', 'Group', 'Text'])

  First_Word         Group                        Text
0      apple    apple bins     where to buy apple bins
1      apple   apple trees         i see an apple tree
2     orange  orange juice         i like orange juice
3      apple   apple trees  apple fell out of the tree
4       pear     pear tree     partrige in a pear tree

然后我做一个groupby

grouped = df.groupby(['First_Word', 'Group']).count()
                         Text
First_Word Group             
apple      apple bins       1
           apple trees      2
orange     orange juice     1
pear       pear tree        1

我现在想将其过滤为仅具有最大 Text 计数的唯一索引行。下面您会注意到 apple bins 已被删除,因为 apple trees 具有最大值。

                         Text
First_Word Group             
apple      apple trees      2
orange     orange juice     1
pear       pear tree        1

这个max value of group 的问题很相似,但是当我尝试这样的事情时:

df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])

我收到一个错误:KeyError: ('Text', 'occurred at index Text')。如果我将axis=1 添加到apply 我得到IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')

【问题讨论】:

    标签: python pandas max


    【解决方案1】:

    给定grouped,您现在想要按First Word 索引级别进行分组,并找到每个组的最大行的索引标签(使用idxmax):

    In [39]: grouped.groupby(level='First_Word')['Text'].idxmax()
    Out[39]: 
    First_Word
    apple       (apple, apple trees)
    orange    (orange, orange juice)
    pear           (pear, pear tree)
    Name: Text, dtype: object
    

    然后您可以使用grouped.loc 通过索引标签从grouped 中选择行:

    import pandas as pd
    df = pd.DataFrame(
        {'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
         'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
         'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
                  'apple fell out of the tree', 'partrige in a pear tree']},
        columns=['First_Word', 'Group', 'Text'])
    
    grouped = df.groupby(['First_Word', 'Group']).count()
    result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
    print(result)
    

    产量

                             Text
    First_Word Group             
    apple      apple trees      2
    orange     orange juice     1
    pear       pear tree        1
    

    【讨论】:

      猜你喜欢
      • 2018-12-14
      • 1970-01-01
      • 1970-01-01
      • 2013-07-14
      • 2017-12-23
      • 2015-06-12
      • 2020-06-24
      • 2021-09-27
      相关资源
      最近更新 更多