复杂的 MySQL 查询给出不正确的结果答案

【问题标题】：Complex MySQL query is giving incorrect results复杂的 MySQL 查询给出不正确的结果
【发布时间】：2015-01-07 13:17:38
【问题描述】：

注意：您可以在此处找到上一个问题及其答案。对其进行深入测试证明之前的答案是错误的：Writing a Complex MySQL Query

我有 3 张桌子。

表Words_Learned 包含用户已知的所有单词，以及单词的学习顺序。它有 3 列 1) 单词 ID 和 2) 用户 ID 和 3) 单词的学习顺序。

表Article 包含文章。它有 3 列 1) 文章 ID、2) 唯一字数和 3) 文章内容。

表Words 包含每篇文章中包含的所有唯一词的列表。它有 2 列 1) 单词 ID 和 2) 文章 ID

数据库图如下/

现在，使用这个数据库并使用“仅”mysql，我需要做以下工作。

给定一个用户 ID，它应该得到一个该用户知道的所有单词的列表，按照他们学习的顺序排序。换句话说，最近学习的单词将位于列表的顶部。

假设对用户 ID 的查询表明他们已经记住了以下 3 个单词，我们会跟踪他们学习这些单词的顺序。章鱼 - 3 狗 - 2 勺子 - 1

首先我们得到包含单词 Octopus 的所有文章的列表，然后使用表 Words 仅对这些文章进行计算。计算意味着如果该文章包含超过 10 个未出现在用户词汇表中的单词（从表 words_learned 中提取），则将其从列表中排除。

然后，我们查询所有包含 dog 但不包含“octopus”的记录

然后，我们查询所有包含spoon但不包含单词Octopus或Dog的记录

您会一直重复这个过程，直到我们找到 100 条符合此条件的记录。

为了实现这个过程，我做了以下（请访问 SQLFiddle 链接以查看表结构、测试数据和我的查询）

http://sqlfiddle.com/#!2/48dae/1

在我的查询中，您可以看到生成的结果，它们是无效的。但是在“正确查询”上，结果应该是，

Level 1
Level 1
Level 1
Level 2
Level 2
Level 2
Level 3
Level 3

这是一个 phudocode 以便更好地理解。

Do while articles found < 100
{
 for each ($X as known words, in order that those words were learned)
 {
  Select all articles that contain the word $X, where the 1) article has not been included in any previous loops, and 2)where the count of "unknown" words is less than 10. 

  Keep these articles in order. 
 }
}

【问题讨论】：

标签： mysql sql database join indexing

【解决方案1】：

select * from (
    select a.idArticle, a.content, max(`order`) max_order
    from words_learned wl
    join words w on w.idwords = wl.idwords
    join article a on a.idArticle = w.idArticle
    where wl.userId = 4
    group by a.idArticle
) a
left join (
    select count(*) unknown_count, w2.idArticle from words w2
    left join words_learned wl2 on wl2.idwords = w2.idwords
    and wl2.userId =  4
    where wl2.idwords is null
    group by w2.idArticle
) unknown_counts on unknown_counts.idArticle = a.idArticle
where unknown_count is null or unknown_count < 10
order by max_order desc
limit 100

http://sqlfiddle.com/#!2/6944b/9

第一个派生表选择给定用户知道一个或多个单词的唯一文章以及这些单词的最大 order 值。最大阶值用于对最终结果进行排序，使包含高阶词的文章首先出现。

第二个派生表计算给定用户在每篇文章中不知道的字数。此表用于排除包含 10 个或更多用户不知道的单词的任何文章。

【讨论】：

感谢您的回复。似乎我的结果（我提到的不正确）与您的结果相似，不是吗？嗯....
@Sniper 你的小提琴指的是用户 ID 4 和 1。这是一个错误吗？
好像是！哇，您编辑的代码给出了确切的答案！请给我几个小时做深度测试！