【问题标题】:Group by days, display days with no data and complex query in left join按天分组,在左连接中显示没有数据和复杂查询的天数
【发布时间】:2015-11-14 18:45:55
【问题描述】:

我在 PostgreSQL 9.4.4 中有复杂的 SQL 查询:

SELECT
  p.id,
  p.name,
  p.page_variant_id,
  p.variant_name,
  (
    SELECT COUNT(*) FROM page_views
    INNER JOIN unique_page_visits upv ON upv.id = page_views.unique_page_visit_id
    WHERE page_views.page_id = p.id AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as views_count,
  (
    SELECT COUNT(*) FROM unique_page_visits upv
    WHERE upv.page_id = p.id  AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as page_visits_count,
  (
    SELECT COUNT(*) FROM conversions
    INNER JOIN conversion_goals cg ON cg.id = conversions.conversion_goal_id
    INNER JOIN unique_page_visits upv ON upv.id = conversions.unique_page_visit_id
    WHERE cg.page_id = p.id  AND conversions.updated_at >= '2015-08-15' AND
          conversions.updated_at <= '2015-08-22' AND cg.name = 'popup'
  ) as conversions_count
FROM
  pages p
WHERE
  p.page_variant_id = '25'
ORDER BY
  p.id ASC

示例结果:

 id | name | page_variant_id | variant_name | views_count | page_visits_count | conversions_count 
----+------+-----------------+--------------+-------------+-------------------+-------------------
 73 | a    |              25 | Original     |           1 |                 1 |                 1
(1 row)

我不知道这个查询是否以最好的方式编写,但它确实有效。
欢迎任何改进! - 删除 SELECT 子查询中的冗余,例如:

AND upv.updated_at >= '2015-08-15' AND upv.updated_at <= '2015-08-22'

问题是我必须按天对结果进行分组。每一天都必须出现在结果中,即使当天没有找到任何行。

我可以重复使用this code(我对此稍作修改;感谢 Erwin Brandstetter):

SELECT *
FROM  (SELECT generate_series('2015-08-15'::date
                            , '2015-08-22'::date
                            , '1 day'::interval)::date) AS d(day)
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= '2007-12-01'::date
   AND    date_col <= '2008-12-06'::date
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;

主要问题是我需要在created_at 字段LEFT JOIN(转换为date)到表page_viewsconversionsunique_page_visits,不在pages 表上(主查询,而不是SELECT 区域中的子查询)。

伪代码:

SELECT * 
FROM
    (SELECT generate_series('2015-08-15'::date
                          , '2015-08-22'::date
                          , '1 day'::interval)::date) AS d(day)

LEFT JOIN (
  SELECT day_from_subquery_not_from_pages::data AS day
  -- other stuff to return proper results AND conditions
) t USING(day)   

这可能吗?

或者也许我将不得不将这个大查询拆分为子查询(然后我将有 3 个......)然后使用UNION 来加入结果?然后我可以从子查询中JOIN ONdays ...

实现这一目标的最佳方法是什么?

【问题讨论】:

  • 在您提醒后我清理了我的referenced answer。最好将generate_series() 等返回集合的函数移到FROM 列表中。
  • 和往常一样,底层表的定义是必不可少的。确切的数据类型和约束与设计最佳查询相关。您可以在 sql fiddle (random example from today) 中提供一个测试用例。

标签: sql database postgresql join group-by


【解决方案1】:

猜测缺少的详细信息,此查询可能就是您要查找的内容:

WITH p AS (
   SELECT '2015-08-15'::date AS a, '2015-08-22'::date AS z  -- enter bounds once
        , id, name, page_variant_id, variant_name
   FROM   pages
   WHERE  page_variant_id = '25'   -- enter ID once
   )
SELECT p.id, p.name, p.page_variant_id, p.variant_name
     , day, v.views_count, pv.page_visits_count, c.conversions_count
FROM   p
     , LATERAL (SELECT day::date FROM generate_series(p.a, p.z, interval '1 day') day) d
LEFT   JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS views_count
   FROM                      p
   JOIN   page_views         pv  ON pv.page_id = p.id
   JOIN   unique_page_visits upv ON upv.id = pv.unique_page_visit_id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) v USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS page_visits_count
   FROM                      p
   JOIN   unique_page_visits upv ON upv.page_id = p.id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) pv USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS conversions_count
   FROM                      p
   JOIN   conversion_goals   cg  ON cg.page_id = p.id
   JOIN   conversions        c   ON c.conversion_goal_id = cg.id
   JOIN   unique_page_visits upv ON upv.id = c.unique_page_visit_id
   WHERE  cg.name = 'popup'
   AND    c.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) c USING (day)
ORDER  BY day;

【讨论】:

  • 哇,成功了,你是天才欧文! :) 我必须将 SELECT 子句更改为 SELECT p.id, p.name, p.page_variant_id, p.variant_name, views_count, page_visits_count, conversions_count, day 以获得正确的结果:codepad.org/RujhA4l2。如果我在维也纳,我会给你买啤酒:D
  • @nothing-special-here:啊,对了,忘了把计数加到最后的SELECT。那么,我稍后会喝一杯虚拟啤酒。干杯!
猜你喜欢
  • 2013-03-19
  • 2015-06-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-07-19
  • 2022-01-16
  • 2010-12-12
  • 2018-06-23
相关资源
最近更新 更多