如何避免 SQL 中的嵌套子查询答案

【问题标题】：how to avoid nested subqueries in SQL如何避免 SQL 中的嵌套子查询
【发布时间】：2019-10-15 03:57:43
【问题描述】：

我刚刚向我的网站添加了一个标记系统，我正在尝试找出运行可扩展查询的最有效方法。这是一个基本的工作 mysql 查询，用于返回给定用户的标签匹配：

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   tags 
   INNER JOIN
      interpretationtags USING (tagid) 
   INNER JOIN
      interpretations USING (interpretation_id) 
   INNER JOIN
      scans 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag = "tag1"

但是当我想查询多个tags 以获得相同的scan 时，它会变得很棘手。你看，tags 存在于不同的interpretations, 中，每个scan. 有多种解释这是使用子查询对两个tags 的有效查询：

SELECT
   a.scan_index,
   a.scan_id,
   a.archive_folder 
FROM
   (
      SELECT
         scans.scan_index,
         scans.scan_id,
         scans.archive_folder 
      FROM
         tags 
         INNER JOIN
            interpretationtags USING (tagid) 
         INNER JOIN
            interpretations USING (interpretation_id) 
         INNER JOIN
            scans 
            ON scans.scan_id = interpretations.scan_id 
            AND scans.archive_folder = interpretations.archive_folder 
         INNER JOIN
            archives 
            ON scans.archive_folder = archives.archive_folder 
      WHERE
         archives.user_id = "google-auth2..." 
         AND tags.tag = "tag1"
   )
   as a 
   INNER JOIN
      interpretations 
      ON a.scan_id = interpretations.scan_id 
      AND a.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag2"

由于这是在 LAMP 堆栈上运行的，因此我编写了一些 PHP 代码来迭代我想包含在此 AND 样式搜索中的 tags，构建一个多嵌套查询。这是一个与三个

SELECT
   b.scan_index,
   b.scan_id,
   b.archive_folder 
FROM
   (
      SELECT
         a.scan_index,
         a.scan_id,
         a.archive_folder 
      FROM
         (
            SELECT
               scans.scan_index,
               scans.scan_id,
               scans.archive_folder 
            FROM
               tags 
               INNER JOIN
                  interpretationtags USING (tagid) 
               INNER JOIN
                  interpretations USING (interpretation_id) 
               INNER JOIN
                  scans 
                  ON scans.scan_id = interpretations.scan_id 
                  AND scans.archive_folder = interpretations.archive_folder 
               INNER JOIN
                  archives 
                  ON scans.archive_folder = archives.archive_folder 
            WHERE
               archives.user_id = "google..." 
               AND tags.tag = "tag1"
         )
         as a 
         INNER JOIN
            interpretations 
            ON a.scan_id = interpretations.scan_id 
            AND a.archive_folder = interpretations.archive_folder 
         INNER JOIN
            interpretationtags USING(interpretation_id) 
         INNER JOIN
            tags USING(tagid) 
      WHERE
         tags.tag = "tag2"
   )
   as b 
   INNER JOIN
      interpretations 
      ON b.scan_id = interpretations.scan_id 
      AND b.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag3"

即使是 4 个嵌套子查询也能以最少的数据快速运行，但当我处理 100k 行数据时，我认为这不是一个可扩展的解决方案。我怎样才能在不恢复到这种丑陋的低效代码的情况下做到这一点？

【问题讨论】：

见meta.stackoverflow.com/questions/333952/…。此外，关于查询优化的问题总是需要给定查询的 EXPLAIN。

标签： mysql performance subquery inner-join

【解决方案1】：

如果没有表结构和示例数据，很难确定，但我认为你的方向是错误的。你应该从扫描开始，找到所有合适的标签，然后过滤那些（应该是一个简单的IN 表达式）：

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   scans
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
   INNER JOIN
      interpretations 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING (interpretation_id) 
   INNER JOIN
      tags USING (tagid) 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag IN("tag1", "tag2")

请注意，根据您的 SELECT 字段列表，我认为您实际上根本不需要 JOIN 到 archives。

【讨论】：

谢谢尼克。这很接近，但它会使用 tag1 或 tag2 进行所有扫描。
@UltrasoundJelly 所以您想要使用 tag1 和 tag2 进行扫描？您的问题不是 100% 清楚
抱歉，目标是 AND。
您应该可以在查询中添加GROUP BY scans.scan_index, scans.scan_id, scans.archive_folder HAVING COUNT(DISTINCT tags.tag) = 2 来实现这一点。
这里确实有效，但是如果在同一个扫描中使用 tag1 有两种解释呢？