【问题标题】:Keeping rows from double-counting in a GROUP BY防止 GROUP BY 中的行重复计数
【发布时间】:2012-09-24 18:41:26
【问题描述】:

这是我的架构和问题的基本内容:http://sqlfiddle.com/#!1/72ec9/4/2

请注意,周期表可以引用可变的时间范围 - 可以是整个赛季,也可以是几场比赛或一场比赛。对于给定的团队和年份,所有期间行都代表专有的时间范围。

我编写了一个查询,它连接表并使用 GROUP BY period.year 来汇总一个赛季的分数(请参阅 sqlfiddle)。但是,如果教练在同一年有两个职位,则 GROUP BY 将计算同一时期的行两次。当一个教练担任两个职位但仍然总结一年由多个时期组成的时期时,我该如何放弃重复?如果有更好的方法来执行架构,如果您向我指出,我也将不胜感激。

【问题讨论】:

  • +1 为您的问题提供功能演示。让它变得如此简单!

标签: sql postgresql join aggregate-functions


【解决方案1】:

潜在问题(加入多个具有多个匹配项的表)在此相关答案中进行了解释:

为了解决这个问题,我首先简化并格式化您的查询:

select pe.year
     , sum(pe.wins)       AS wins
     , sum(pe.losses)     AS losses
     , sum(pe.ties)       AS ties
     , array_agg(po.id)   AS position_id
     , array_agg(po.name) AS position_names
from   periods_positions_coaches_linking pp
join   positions po ON po.id = pp.position
join   periods   pe ON pe.id = pp.period
where  pp.coach = 1
group  by pe.year
order  by pe.year;

产生的结果与原始结果相同,不正确,但更简单/更快/更易于阅读。

  • 只要不使用SELECT 列表中的列,就没有必要加入coach 表。我将其完全删除,并将WHERE 条件替换为where pp.coach = 1

  • 您不需要COALESCENULL 值在聚合函数 sum() 中被忽略。无需替换0

  • 使用表别名使其更易于阅读。

接下来,我像这样解决了您的问题:

SELECT *
FROM  (
   SELECT pe.year
        , array_agg(DISTINCT po.id)   AS position_id
        , array_agg(DISTINCT po.name) AS position_names
   FROM   periods_positions_coaches_linking pp
   JOIN   positions                         po ON po.id = pp.position
   JOIN   periods                           pe ON pe.id = pp.period
   WHERE  pp.coach = 1
   GROUP  BY pe.year
   ) po
LEFT   JOIN (
   SELECT pe.year
        , sum(pe.wins)   AS wins
        , sum(pe.losses) AS losses
        , sum(pe.ties)   AS ties
   FROM  (
      SELECT period
      FROM   periods_positions_coaches_linking
      WHERE  coach = 1
      GROUP  BY period
      ) pp
   JOIN   periods pe ON pe.id = pp.period
   GROUP  BY pe.year
   ) pe USING (year)
ORDER  BY year;
  • 在加入之前分别汇总职位和期间。

  • 第一个子查询中,po 仅使用array_agg(DISTINCT ...) 列出位置一次。

  • 第二个子查询中 pe ...

    • GROUP BY period,因为教练可以在每个时期担任多个职位。
    • JOIN 到 period-data 之后,然后聚合得到总和。

db小提琴here
sqlfiddle

【讨论】:

    【解决方案2】:

    使用distinct 如图here

    代码:

    select periods.year as year,
    sum(coalesce(periods.wins, 0)) as wins,
    sum(coalesce(periods.losses, 0)) as losses,
    sum(coalesce(periods.ties, 0)) as ties,
    array_agg( distinct positions.id) as position_id,
    array_agg( distinct positions.name) as position_names
    
    from periods_positions_coaches_linking
    
    join coaches on coaches.id = periods_positions_coaches_linking.coach
    join positions on positions.id = periods_positions_coaches_linking.position
    join periods on periods.id = periods_positions_coaches_linking.period
    
    where coaches.id = 1
    
    group by periods.year, positions.id
    order by periods.year;
    

    【讨论】:

    • 现在返回 2014 年的两行。我希望每年返回一行,但我知道 SQL 是否不能这样做。如果我使用此查询,是否可以保证这两行的分数相同,并且我不必在查询之外进行任何求和/处理? (我担心极端情况)。
    • 为保证分数相同,您需要在where 子句中添加该条件
    • 当教练在每个周期内有多个职位时,这会失败,从而使聚合中的值相乘。您需要首先获得每个教练的不同时期...
    【解决方案3】:

    在您的情况下,最简单的方法可能是划分职位:

    select periods.year as year,
           sum(coalesce(periods.wins, 0))/COUNT(distinct positions.id) as wins,
           sum(coalesce(periods.losses, 0))/COUNT(distinct positions.id) as losses,
           sum(coalesce(periods.ties, 0))/COUNT(distinct positions.id) as ties,
           array_agg(distinct positions.id) as position_id,
           array_agg(distinct positions.name) as position_names
    from periods_positions_coaches_linking join
         coaches
         on coaches.id = periods_positions_coaches_linking.coach join
         positions
         on positions.id = periods_positions_coaches_linking.position join
         periods
         on periods.id = periods_positions_coaches_linking.period
    where coaches.id = 1
    group by periods.year
    order by periods.year;
    

    位置的数量决定了胜负和平局,因此将其分开来调整计数。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-06-15
      • 1970-01-01
      • 1970-01-01
      • 2022-11-18
      • 2019-02-20
      • 1970-01-01
      • 2018-06-27
      相关资源
      最近更新 更多