【问题标题】:How to redefine slow SQL query with COUNT() of large dataset in MySQL如何使用 MySQL 中大型数据集的 COUNT() 重新定义慢速 SQL 查询
【发布时间】:2026-02-23 09:20:04
【问题描述】:

我有一个这样的 SQL 查询,我需要重新定义它,或者我相信使用索引会有所帮助,但我不知道哪些列包含在索引中。

  • b_answers 大约有。数万行
  • b_projects 大约有。数千行
  • b_users 有几十行

这些AS count_* 列是排序所必需的。

SELECT
    p.id,
    p.datetime,
    u.name AS u_name,
    p.name,
    p.note,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND  changed != '0000-00-00 00:00:00') AS count_filled,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND started = '1') AS count_started,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id) AS count_sent,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '1' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz1_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '2' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz1_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '1' AND started = '1') AS count_started_quiz1_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '2' AND started = '1') AS count_started_quiz1_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '1') AS count_sent_quiz1_a
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '2') AS count_sent_quiz1_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '3' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz3_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '4' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz3_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '3' AND started = '1') AS count_started_quiz3_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '4' AND started = '1') AS count_started_quiz3_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '3') AS count_sent_quiz3_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '4') AS count_sent_quiz3_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '5' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz5_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '6' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz5_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '5' AND started = '1') AS count_started_quiz5_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '6' AND started = '1') AS count_started_quiz5_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '5') AS count_sent_quiz5_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '6') AS count_sent_quiz5_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '7' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz7_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '8' AND  changed != '0000-00-00 00:00:00') AS count_filled_quiz7_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '7' AND started = '1') AS count_started_quiz7_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '8' AND started = '1') AS count_started_quiz7_b,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '7') AS count_sent_quiz7_a,
    (SELECT COUNT(id) FROM b_answers WHERE project = t.id AND quiz = '8') AS count_sent_quiz7_b
FROM 
    b_projects p
LEFT JOIN 
    b_answers a ON a.project = p.id
LEFT JOIN 
    b_users u ON u.id = p.admin
GROUP BY
    p.nazev

【问题讨论】:

  • 这适用于哪个 RDBMS?请添加一个标签以指定您使用的是mysqlpostgresqlsql-serveroracle 还是db2 - 或者完全是其他东西。
  • MySQL,很抱歉漏掉了,我已经填好了。

标签: mysql sql performance indexing count


【解决方案1】:

使用条件聚合!这个想法是:

SELECT p.id, p.datetime, u.name AS u_name, p.name, p.note,
       SUM(a.changed <> '0000-00-00 00:00:00') AS count_filled,
       SUM(a.started = '1') AS count_started,
       . . .   -- and so one for the rest of the columns
FROM b_projects p LEFT JOIN 
     b_answers a 
     ON a.project = p.id LEFT JOIN 
     b_users u
     ON u.id = p.admin
GROUP BY p.id, p.datetime, u.name, p.name, p.note

【讨论】:

  • 警告 -- JOIN 可能会使 SUM 膨胀。
  • @RickJames。 . .尽管有可能,但这似乎不太可能。从b_projectsb_usersjoin 似乎是(最多)一个用户。并且可以合理猜测p_projects.id 是该表中的主键。
  • 嗯,这行得通,但它仍然很慢。查询耗时几十秒,第一次用了一半的数据集。下次很快,但我相信它是由缓存引起的。
  • 现在,我已经为列 a.changeda.starteda.projecta.quizp.admin 添加了索引,现在大约需要 2 秒,但我仍然想提高速度。我已经阅读了下面链接的关于索引的文章,但我现在不知道如何以不同的方式创建索引。
  • @sylar32 。 . .我猜你有很多数据——group by 会占用时间。我看不出这些索引有什么帮助。 a(pid) and u(admin)` 上的索引可能会有所帮助。您可以根据SELECT 中引用的内容添加其他列。
【解决方案2】:

只是不要重新发明*!请看这个。 What columns generally make good indexes?

用于比较(条件)的列中通常需要索引。 因此,在您的情况下,我认为可用于提高 COST 的索引将包括查询的这一部分中的列。

LEFT JOIN b_answers a ON a.project = p.id /* 考虑使用索引 */

左连接 b_users u ON u.id = p.admin /* 考虑使用索引 */

分组依据 p.nazev /* 考虑使用索引 - STILL UNSURE*/

您可以通过反复试验检查查询的效果。

希望这会有所帮助。

干杯

【讨论】:

    【解决方案3】:

    要正确查询,您需要考虑表是否处于 1:many 映射中。如果是这样,您希望计数是“多”还是“1”

    我会假设您不想要膨胀的值,所以我将在派生表中获取计数首先,然后加入其他表:

    SELECT  p.id, p.datetime, u.name AS u_name, p.name, p.note,
            count_filled, count_started, ...
        FROM  
            ( SELECT
                    SUM(a.changed <> '0000-00-00 00:00:00') AS count_filled,
                    SUM(a.started = '1') AS count_started,
                    ...
                FROM  b_answers AS a 
            ) AS aa
        JOIN  b_projects p  ON aa.project = p.id
        LEFT JOIN  b_users u  ON u.id = p.admin 
    

    请注意,这避免了GROUP BY,从而提供了一个加速。另一个是避免“inflate-deflate”。

    假设id是每个表的PRIMARY KEY,则不需要额外的索引。

    【讨论】: