【问题标题】:how to avoid nested subqueries in SQL如何避免 SQL 中的嵌套子查询
【发布时间】:2019-10-15 03:57:43
【问题描述】:

我刚刚向我的网站添加了一个标记系统,我正在尝试找出运行可扩展查询的最有效方法。这是一个基本的工作 mysql 查询,用于返回给定用户的标签匹配:

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   tags 
   INNER JOIN
      interpretationtags USING (tagid) 
   INNER JOIN
      interpretations USING (interpretation_id) 
   INNER JOIN
      scans 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag = "tag1"

但是当我想查询多个tags 以获得相同的scan 时,它会变得很棘手。你看,tags 存在于不同的interpretations, 中,每个scan. 有多种解释这是使用子查询对两个tags 的有效查询:

SELECT
   a.scan_index,
   a.scan_id,
   a.archive_folder 
FROM
   (
      SELECT
         scans.scan_index,
         scans.scan_id,
         scans.archive_folder 
      FROM
         tags 
         INNER JOIN
            interpretationtags USING (tagid) 
         INNER JOIN
            interpretations USING (interpretation_id) 
         INNER JOIN
            scans 
            ON scans.scan_id = interpretations.scan_id 
            AND scans.archive_folder = interpretations.archive_folder 
         INNER JOIN
            archives 
            ON scans.archive_folder = archives.archive_folder 
      WHERE
         archives.user_id = "google-auth2..." 
         AND tags.tag = "tag1"
   )
   as a 
   INNER JOIN
      interpretations 
      ON a.scan_id = interpretations.scan_id 
      AND a.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag2"

由于这是在 LAMP 堆栈上运行的,因此我编写了一些 PHP 代码来迭代我想包含在此 AND 样式搜索中的 tags,构建一个多嵌套查询。这是一个与三个

SELECT
   b.scan_index,
   b.scan_id,
   b.archive_folder 
FROM
   (
      SELECT
         a.scan_index,
         a.scan_id,
         a.archive_folder 
      FROM
         (
            SELECT
               scans.scan_index,
               scans.scan_id,
               scans.archive_folder 
            FROM
               tags 
               INNER JOIN
                  interpretationtags USING (tagid) 
               INNER JOIN
                  interpretations USING (interpretation_id) 
               INNER JOIN
                  scans 
                  ON scans.scan_id = interpretations.scan_id 
                  AND scans.archive_folder = interpretations.archive_folder 
               INNER JOIN
                  archives 
                  ON scans.archive_folder = archives.archive_folder 
            WHERE
               archives.user_id = "google..." 
               AND tags.tag = "tag1"
         )
         as a 
         INNER JOIN
            interpretations 
            ON a.scan_id = interpretations.scan_id 
            AND a.archive_folder = interpretations.archive_folder 
         INNER JOIN
            interpretationtags USING(interpretation_id) 
         INNER JOIN
            tags USING(tagid) 
      WHERE
         tags.tag = "tag2"
   )
   as b 
   INNER JOIN
      interpretations 
      ON b.scan_id = interpretations.scan_id 
      AND b.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag3"

即使是 4 个嵌套子查询也能以最少的数据快速运行,但当我处理 100k 行数据时,我认为这不是一个可扩展的解决方案。我怎样才能在不恢复到这种丑陋的低效代码的情况下做到这一点?

【问题讨论】:

标签: mysql performance subquery inner-join


【解决方案1】:

如果没有表结构和示例数据,很难确定,但我认为你的方向是错误的。你应该从扫描开始,找到所有合适的标签,然后过滤那些(应该是一个简单的IN 表达式):

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   scans
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
   INNER JOIN
      interpretations 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING (interpretation_id) 
   INNER JOIN
      tags USING (tagid) 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag IN("tag1", "tag2")

请注意,根据您的 SELECT 字段列表,我认为您实际上根本不需要 JOINarchives

【讨论】:

  • 谢谢尼克。这很接近,但它会使用 tag1 或 tag2 进行所有扫描。
  • @UltrasoundJelly 所以您想要使用 tag1 和 tag2 进行扫描?您的问题不是 100% 清楚
  • 抱歉,目标是 AND。
  • 您应该可以在查询中添加GROUP BY scans.scan_index, scans.scan_id, scans.archive_folder HAVING COUNT(DISTINCT tags.tag) = 2 来实现这一点。
  • 这里确实有效,但是如果在同一个扫描中使用 tag1 有两种解释呢?
猜你喜欢
  • 1970-01-01
  • 2011-02-16
  • 2018-09-27
  • 1970-01-01
  • 2011-01-09
  • 2016-10-02
  • 1970-01-01
  • 2020-05-22
  • 1970-01-01
相关资源
最近更新 更多