SQL group by 多个条件答案

【问题标题】：SQL group by with multiple conditionsSQL group by 多个条件
【发布时间】：2014-02-12 21:34:25
【问题描述】：

我正在处理访问者日志数据，需要按 IP 地址对其进行汇总。数据如下所示：

编号 | ip_address |类型 |留言 | ... ----------+----------------+----------+------------ ----- 1 | 1.2.3.4 |购买 | ... 2 | 1.2.3.4 |访问 | ... 3 | 3.3.3.3 |访问 | ... 4 | 3.3.3.3 |购买 | ... 5 | 4.4.4.4 |访问 | ... 6 | 4.4.4.4 |访问 | ...

并且应该总结为：

type="purchase" DESC, type="visit" DESC, id DESC

产量：

选择ID | ip_address |类型 |留言 | ... ----------+----------------+----------+------------ ----- 1 | 1.2.3.4 |购买 | ... 4 | 3.3.3.3 |购买 | ... 6 | 4.4.4.4 |访问 | ...

有没有一种优雅的方式来获取这些数据？

一个丑陋的方法如下：

设置@row_num = 0; 如果不存在则创建临时表 tt AS SELECT *,@row_num:=@row_num+1 as row_index FROM log ORDER BY type="purchase" DESC, type="visit" DESC, id DESC 按等级排序；

然后得到每个ip_address的最小row_index和id（https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column）

然后将这些id加入到原来的表中

【问题讨论】：

标签： mysql sql greatest-n-per-group

【解决方案1】：

我认为这应该是你需要的：

SELECT yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT   ip_address,
             MAX(CASE WHEN type='purchase' THEN id END) max_purchase,
             MAX(CASE WHEN type='visit' THEN id END) max_visit
    FROM     yourtable
    GROUP BY ip_address) m
  ON yourtable.id = COALESCE(max_purchase, max_visit)

请看小提琴here。

我的子查询将返回最大购买 ID（如果没有购买，则返回 null）和最大访问 ID。然后我用 COALESCE 加入表，如果 max_purchase 不为空，则加入将在 max_purchase 上，否则它将在 max_visit 上。

【讨论】：

这是最直接的。非常好的方法，谢谢！

【解决方案2】：

你可以在这里使用Bill Karwin's approach：

SELECT t1.*
FROM (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t1
LEFT JOIN (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t2
  ON t1.ip_address = t2.ip_address
  AND (t2.is_purchase > t1.is_purchase
     OR (t2.is_purchase = t1.is_purchase AND t2.id > t1.id))
WHERE t2.id IS NULL

SQL 小提琴here

【讨论】：

【解决方案3】：

以下查询使用相关子查询根据您的规则获取最新的id：

select t.ip_adddress,
       (select t2.id
        from table t2
        where t2.ip_address = t1.ip_address
        order by type = 'purchase' desc, id desc
        limit 1
       ) as mostrecent
from (select distinct t.ip_address
      from table t
     ) t;

这个想法是首先按购买（id 也降序）对数据进行排序，然后按访问次数对数据进行排序，然后选择列表中的第一个。如果您有一个 ipaddresses 表，则不需要 distinct 子查询。只需使用该表即可。

要获得最终结果，我们可以join 到此或使用in 或exists。这使用in。

select t.*
from table t join
     (select id, (select t2.id
                  from table t2
                  where t2.ip_address = t1.ip_address
                  order by type = 'purchase' desc, id desc
                  limit 1
                 ) as mostrecent
      from (select distinct t.ip_address
            from table t
           ) t
     ) ids
     on t.id = ids.mostrecent;

如果table(ip_address, type, id) 上有索引，则此查询效果最佳。

【讨论】：