【问题标题】:Finding most common values with Pandas GroupBy and value_counts使用 Pandas GroupBy 和 value_counts 查找最常见的值
【发布时间】:2018-11-08 14:31:34
【问题描述】:

我正在处理表格中的两列。

+-------------+--------------------------------------------------------------+
|  Area Name  |                       Code Description                       |
+-------------+--------------------------------------------------------------+
| N Hollywood | VIOLATION OF RESTRAINING ORDER                               |
| N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| N Hollywood | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT               |
| Southeast   | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT               |
| West Valley | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| West Valley | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| 77th Street | RAPE, FORCIBLE                                               |
| Foothill    | CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060 |
| N Hollywood | VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 0114 |
+-------------+--------------------------------------------------------------+

我正在使用 Groupby 和 value_counts 按区域名称查找代码描述。

df.groupby(['Area Name'])['Code Description'].value_counts()

有没有办法只查看每个区域名称的前“n”个值?如果我将.nlargest(3) 附加到上面的代码中,它只会返回一个区域名称的结果。

+---------------------------------------------------------------------------------+
| Wilshire     SHOPLIFTING-GRAND THEFT ($950.01 & OVER)                         7 |
+---------------------------------------------------------------------------------+

【问题讨论】:

  • 我的问题更清楚了。
  • 问题被重新打开,因为这里是count TOP N个值,在另一个问题中最常见的值,所以这里不能使用这个答案。

标签: python python-3.x pandas pandas-groupby


【解决方案1】:

根据value_counts的结果在每组中使用head

df.groupby('Area Name')['Code Description'].apply(lambda x: x.value_counts().head(3))

输出:

Area Name                                                                
77th Street  RAPE, FORCIBLE                                                  1
Foothill     CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060    1
N Hollywood  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
             VIOLATION OF RESTRAINING ORDER                                  1
             ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
Southeast    ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
West Valley  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
Name: Code Description, dtype: int64

【讨论】:

    【解决方案2】:

    你可以执行双重groupby:

    s = df.groupby('Area Name')['Code Description'].value_counts()
    res = s.groupby('Area Name').nlargest(3).reset_index(level=1, drop=True)
    
    print(res)
    
    Area Name    Code Description                                            
    77th Street  RAPE, FORCIBLE                                                  1
    Foothill     CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060    1
    N Hollywood  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
                 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
                 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 0114    1
    Southeast    ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
    West Valley  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
    Name: Code Description, dtype: int64
    

    【讨论】:

      猜你喜欢
      • 2021-12-22
      • 2013-02-19
      • 2018-02-21
      • 2021-07-21
      • 2020-06-14
      • 2015-06-20
      • 1970-01-01
      相关资源
      最近更新 更多