【问题标题】:在表列中查找每组最频繁的值
【发布时间】:2021-12-09 19:41:50
【问题描述】:

我需要为每个种族找到object_of_search 的最常见值。我怎样才能做到这一点? SELECT 子句中的子查询和相关子查询是不允许的。类似的东西:

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

但这并没有汇总,并且为每个种族和 object_of_search 提供了很多行:

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms
 ethnicity3                |                 2 |              100 | Firearms
 ethnicity1                |                 5 |               60 | Cat
 ethnicity1                |                 5 |               60 | Dog
 ethnicity2                |                 3 | 66.6666666666667 | Firearms
 ethnicity1                |                 5 |               60 | Psychoactive substances
 ethnicity1                |                 5 |               60 | Fireworks

应该是这样的:

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms

fiddle上的表。
查询:

SELECT DISTINCT
    stopAndSearches.officer_defined_ethnicity,
    count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
    sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
       OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
       count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
    mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

表:

CREATE TABLE IF NOT EXISTS stopAndSearches(
    "sas_id" bigserial PRIMARY KEY,
    "officer_defined_ethnicity" VARCHAR(255),
    "object_of_search" VARCHAR(255),
    "outcome" VARCHAR(255)
);

【问题讨论】:

    标签: sql postgresql greatest-n-per-group


    【解决方案1】:

    更新:Fiddle

    这应该解决具体的“每个种族的对象”问题。

    请注意,这不解决计数中的关系。这不是问题/请求的一部分。

    调整您的 SQL 以包含此逻辑,以提供该详细信息:

    WITH cte AS (
            SELECT officer_defined_ethnicity
                 , object_of_search
                 , COUNT(*) AS n
                 , ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
              FROM stopAndSearches
             GROUP BY officer_defined_ethnicity, object_of_search
         )
    SELECT * FROM cte
     WHERE rn = 1
    ;
    

    结果:

    officer_defined_ethnicity object_of_search n rn
    ethnicity1 Cat 1 1
    ethnicity2 Stolen goods 2 1
    ethnicity3 Fireworks 1 1

    【讨论】:

      【解决方案2】:
      SELECT DISTINCT ON (1)
             officer_defined_ethnicity, object_of_search, count(*) AS ct
      FROM   stop_and_searches
      GROUP  BY 1, 2
      ORDER  BY 1, 3 DESC, 2;
      

      或更明确地说:

      SELECT DISTINCT ON (officer_defined_ethnicity)
             officer_defined_ethnicity, object_of_search, count(*) AS ct
      FROM   stop_and_searches
      GROUP  BY officer_defined_ethnicity, object_of_search
      ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;
      
       officer_defined_ethnicity | object_of_search | ct
      ---------------------------+------------------+----
       ethnicity1                | Cat              | 1
       ethnicity2                | Stolen goods     | 2
       ethnicity3                | Firearms         | 1
      

      db小提琴here

      由于 DISTINCT ON 是在 GROUP BY 之后应用的,因此我们只需要一个查询级别。

      1. 聚合以获取每个 (officer_defined_ethnicity, object_of_search)GROUP BY 的计数。
      2. officer_defined_ethnicityDISTINCT ON 中选择计数最高的行。

      我将object_of_search 添加为第三个ORDER BY 项目以充当决胜局并产生确定性结果:
      如果出现平局,请根据字母排序顺序选择第一个object_of_search
      适应您的需求。

      见:

      row_number() 的子查询更简单且通常更快:

      【讨论】:

        猜你喜欢
        • 2022-08-18
        • 1970-01-01
        • 2015-10-08
        • 1970-01-01
        • 2020-11-23
        • 2013-11-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多