【问题标题】:How to improve this SQL for speed?如何提高这个 SQL 的速度?
【发布时间】:2021-12-02 14:40:11
【问题描述】:

我编写了这个运行良好但需要 2 分钟加载的 SQL 代码。我想更好地获得更快的代码。此代码为 2019 年每个季度的用户活动和邀请计数构建了一个条形图/计数。您可以看到我必须为那一年定义每个季度,这并不理想,因为我必须做很多手动工作放更多宿舍! 请问我该如何改进这段代码?谢谢!


WITH teams AS (SELECT cc.id , cc.name as company_name,clg.id as team_id, 
(select count(*) from companies_learnergroup clg where clg.company_id = cc.id) as number_of_teams_in_company, clg.name as team_name, clg.location_address,
(select count(*) from auth_user u where u.company_id = cc.id) as company_user_count,
(select count(*) from auth_user u where u.team_id = clg.id) as team_user_count,
(select count(*) from auth_user u where u.team_id = clg.id AND  ((u.last_activity BETWEEN 'January 01, 2019, 00:00 AM' AND 'March 31, 2019, 11:59 PM') or (u.last_login BETWEEN 'January 01, 2019, 00:00 AM' AND 'March 31, 2019, 11:59 PM'))) as active_users_q1_2019,
(select count(*) from auth_user u, companies_invitation ci where ci.learner_group_id = clg.id  and u.id = ci.inviter_id and ci.learner_group_id = u.team_id and ci.created_at BETWEEN 'January 01, 2019, 00:00 AM' AND 'March 31, 2019, 11:59 PM') as number_of_users_invited_to_team_by_user_in_same_team_in_q1_2019, 
(select count(*) from auth_user u where u.team_id = clg.id and (u.last_activity BETWEEN 'April 01, 2019, 00:00 AM' AND 'June 30, 2019, 11:59 PM') or (u.last_login BETWEEN 'April 01, 2019, 00:00 AM' AND 'June 30, 2019, 11:59 PM'))) as active_users_q2_2019,
(select count(*) from auth_user u, companies_invitation ci where ci.learner_group_id = clg.id  and u.id = ci.inviter_id and ci.learner_group_id = u.team_id  and ci.created_at BETWEEN 'April 01, 2019, 00:00 AM' AND 'June 30, 2019, 11:59 PM') as number_of_users_invited_to_team_by_user_in_same_team_in_q2_2019, 
(select count(*) from auth_user u where u.team_id = clg.id and (u.last_activity BETWEEN 'July 01, 2019, 00:00 AM' AND 'September 30, 2019, 11:59 PM') or (u.last_login BETWEEN 'July 01, 2019, 00:00 AM' AND 'September 30, 2019, 11:59 PM'))) as active_users_q3_2019,
(select count(*) from auth_user u, companies_invitation ci where ci.learner_group_id = clg.id  and u.id = ci.inviter_id and ci.learner_group_id = u.team_id and ci.created_at BETWEEN 'July 01, 2019, 00:00 AM' AND 'September 30, 2019, 11:59 PM') as number_of_users_invited_to_team_by_user_in_same_team_in_q3_2019, 
(select count(*) from auth_user u where u.team_id = clg.id  AND  ((u.last_activity BETWEEN 'October 01, 2019, 00:00 AM' AND 'December 31, 2019, 11:59 PM') or (u.last_login BETWEEN 'October 01, 2019, 00:00 AM' AND 'December 31, 2019, 11:59 PM'))) as active_users_q4_2019,
(select count(*) from auth_user u, companies_invitation ci where ci.learner_group_id = clg.id  and u.id = ci.inviter_id and ci.learner_group_id = u.team_id and ci.created_at BETWEEN 'October 01, 2019, 00:00 AM' AND 'December 31, 2019, 11:59 PM') as number_of_users_invited_to_team_by_user_in_same_team_in_q4_2019, 
cc.created_at as company_account_created, 
clg.created as team_account_created, cc.office_location, cc.region_of_responsibility, cc.company_type,
cc.company_url,  string_agg(ctag.name, ', ') as retailer_tags FROM companies_company cc
JOIN companies_learnergroup clg on clg.company_id = cc.id
JOIN company_companytag cctag ON cc.id = cctag.company_id
JOIN companies_tag ctag ON ctag.id = cctag.tag_id
WHERE cc.company_type = 'retailer'
and cc.deactivated is null
GROUP BY cc.id, cc.name, clg.id
ORDER BY cc.id) 
     
    SELECT COUNT(*) as "Amount", '2019 Q1 Retailer teams with at least 1 active user and at least 1 invite sent from user in same team' as "Filter" FROM teams 
    WHERE active_users_q1_2019 > 0 AND number_of_users_invited_to_team_by_user_in_same_team_in_q1_2019 > 0
   
    
    UNION
    SELECT COUNT(*) as "Amount", '2019 Q2 Retailer teams with at least 1 active user and at least 1 invite sent from user in same team' FROM teams 
    WHERE active_users_q2_2019 > 0 AND number_of_users_invited_to_team_by_user_in_same_team_in_q2_2019 > 0
    

    
    UNION
    SELECT COUNT(*) as "Amount", '2019 Q3 Retailer teams with at least 1 active user and at least 1 invite sent from user in same team' FROM teams 
    WHERE active_users_q3_2019 > 0 AND number_of_users_invited_to_team_by_user_in_same_team_in_q3_2019 > 0
   
    
    UNION
    SELECT COUNT(*) as "Amount", '2019 Q4 Retailer teams with at least 1 active user and at least 1 invite sent from user in same team in Q4 2019' FROM teams 
    WHERE active_users_q4_2019 > 0 AND number_of_users_invited_to_team_by_user_in_same_team_in_q4_2019 > 0
    

    
) funnel
ORDER BY "Filter"`

   

【问题讨论】:

  • 如果这个请求很频繁,但表的更新不是很重要,materialised view 可能是有意义的 - 可能会随着相关表的更新而触发更新。
  • 这里最大的桌子是什么? auth_user 有多少行?存在哪些索引?
  • 我在子查询中看到了一些隐式连接。考虑到查询的复杂性,使用显式连接可能会改变执行计划,从而改变性能。至少,这就是this 旧问题的答案所暗示的。
  • 运行查询、将数据移动到前端、构建图表所花费的时间在哪里?在查询上运行 EXPLAIN ANALYZE 以查看查询本身的时间。还将输出添加到您的问题中。
  • 短评论:天哪。

标签: sql postgresql performance date time


【解决方案1】:

与性能没有直接关系,除了发布的查询未运行。它在第 7 和第 9 选择中包含缺少的开括号。段 and (u.last_activity ... 应该是 and ((u.last_activity ...。在第 5 和第 11 次选择中,) funnel 也会产生错误。但我只会考虑这些并建立帖子。
最大的问题是您的BETWEEN 语句不起作用,它们会运行但不会产生正确的结果。这是因为您正在为 MONTH Names 进行 TEXT 比较。不幸的是,文本比较 Postgres 对日历一无所知。假设 u.last_activity 实际上包含一个日期,您可以直接通过 extract('quarter' from u.last_activity) 获取季度。如果它不是日期(时间戳),那么更改数据模型可能为时不晚。请参阅demo,了解哪些季度通过文本比较获得哪些月份。另见Postgres extract

【讨论】:

    猜你喜欢
    • 2014-05-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多