【问题标题】:Query to find topics depending on the tag查询根据标签查找主题
【发布时间】:2026-02-08 11:45:01
【问题描述】:

我想在我的应用程序中为以下数据提供搜索功能

topic_id   tag
1          cricket
1          football
2          football
2          basketball
3          cricket
3          basketball
4          chess
4          basketball

现在当我搜索术语cricket AND football o/p 时应该是

 topic_id
    1

当我搜索术语cricket OR football o/p 时应该是

 topic_id
    1
    2
    3

我尝试以下类似的方法

  select topic_id from table_name where tag like "%cricket%" and topic_id in (select topic_id from table_name where tag like "%football%")

为或

 select topic_id from table_name where tag like "%cricket%" OR tag like "%football%"

我的问题是当用户搜索 cricket AND football AND basketball AND chess 我的查询变得非常可悲

是否有任何简单的解决方案。我也尝试了 GROUP_CONCAT,但没有成功

【问题讨论】:

  • 为什么没有活动和主题之间存在多对多关系的活动模型?那么每个活动都会有一个 id,你可以要求topic.activities.include?("baseball")

标签: sql ruby-on-rails mysql


【解决方案1】:
 SELECT TopicId
 FROM Table
 WHERE Tag IN ('cricket', 'football', 'basketball', 'chess')
 GROUP By TopicId
 HAVING Count(*) = 4

  4 is magic number - its a length of your AND list.

 FOR cricket AND football

 it will be 2:

 SELECT TopicId
 FROM Table
 WHERE Tag IN ('cricket', 'football')
 GROUP By TopicId
 HAVING Count(*) = 2

 if you want use 'like' statement:

 SELECT TopicId
 FROM Table
 WHERE Tag IN (SELECT distinct Tag from Table Where Tag like '...'
                OR Tag like '...'
                OR Tag like '...'
                OR Tag like '...'
              )
 GROUP By TopicId
 HAVING Count(*) = (SELECT COUNT(distinct Tag) from Table 
                    Where Tag like '...'
                       OR Tag like '...' 
                       OR Tag like '...'
                       OR Tag like '...'
                   )

更新:

使用支持所有集合操作的 RDBMS 可以轻松解决此任务:UNIONINTERSECTEXCEPT(或 MINUS)

然后是任何条件,例如:

  1. (Tag1 AND Tag2) OR Tag3 NOT Tag4
  2. 标签 1 或标签 2
  3. Tag1 AND Tag2 和 Tag3
  4. (标签 1 与标签 2)或(标签 3 与标签 4)

可以很容易地转化为:

1. (Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )
   UNION
   (Select * ... Where Tag = Tag3)
   EXCEPT
   (Select * ... Where Tag = Tag4)

2. Select * ... Where Tag = Tag1
   UNION
   Select * ... Where Tag = Tag2

3. Select * ... Where Tag = Tag1
   INTERSECT
   Select * ... Where Tag = Tag2
   INTERSECT
   Select * ... Where Tag = Tag3

 4.(Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )
   UNION
   (Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )

MYSQL不支持INTERSECT的真正问题,应该如上图模拟。第二个问题是尊重括号和运算符优先级。

在表达式中不使用括号的可能解决方案:

  1. 收集所有以 AND 条件连接的标签并构建查询作为答案中的第一个示例。

  2. 添加所有加入 OR 条件的标签(可以使用 IN 或 UNION)并使用 UNION 组合结果。

另一种方法只有在标签数量少于 64 时才有可能。然后每个标签都有自己的位(您需要将 bigint 字段“标签”添加到主题表中,其中将以二进制格式表示标签)并使用 mysql 位操作创建查询。

这个解决方案仅限于 64 个标签的大缺点。

【讨论】:

  • 哦!!!谢谢它会起作用,但实际上我想要like 而不是matching whole string 例如:-tag like '%cricket%'AND I think like is not working with in
  • @Salil - 为什么不呢? :) 添加了类似方法的查询
  • @Michael Pakhantsov:- 你不认为在你的例子中HAVING Count(*) = will return 8 而我想要结果0 行,因为没有topic_id 具有上述所有标签
  • @Salil。不明白查询的哪一部分会返回 8?你能解释一下吗?
  • @Michael Pakhantsov:- 好吧,忘了它,告诉我使用该查询的搜索字符串 cricket AND football AND basketball AND chess 的 o/p 是什么?
【解决方案2】:

您需要进行自我加入

select distinct topic_id from 
table_name as t1
join
table_name as t2 
on 
t1.topic_id = t2.topic_id
and
t1.tag = "cricket"
and
t2.tag = "football"

【讨论】:

  • 请回答cricket AND football AND basketball AND chess等搜索词
  • 您对上述解决方案有什么不明白的地方。您可以为每个标签添加任意数量的连接。
【解决方案3】:

a 与 b 与 c 与 d:

SELECT t1.topic_id
FROM tags_table AS t1
INNER JOIN tags_table AS t2
ON t2.topic_id = t1.topic_id AND t2.tag = 'b'
INNER JOIN tags_table AS t3
ON t3.topic_id = t1.topic_id AND t3.tag = 'c'
INNER JOIN tags_table AS t4
ON t4.topic_id = t1.topic_id AND t4.tag = 'd'
WHERE t1.tag = 'a'

不幸的是,OR 条件更难。全外连接会很方便,但 MySQL 缺少该功能。

我建议确保括号内没有 OR(不是 (a OR b) AND c,而是 (a AND c) OR (b AND c) 并像这样进行查询:

a OR b OR c OR(一些和子句,如 d AND e):

SELECT DISTINCT topic_id FROM (
  SELECT topic_id FROM tags_table where tag = 'a'
  UNION ALL
  SELECT topic_id FROM tags_table where tag = 'b'
  UNION ALL
  SELECT topic_id FROM tags_table where tag = 'c'
  UNION ALL
  query_like_the_previous_one_represinting_some_AND_clause
) as union_table

在 MySQL 以外的数据库软件中,您可以像这样使用查询可能(我现在无法测试它):

SELECT COALESCE(t1.topic_id, t2.topic_id, t3.topic_id, ...)
FROM tags_table AS t1
INNER JOIN tags_table AS t2
ON t2.topic_id = t1.topic_id AND t2.tag = 'b'
FULL OUTER JOIN tags_table AS t3
ON t3.topic_id = t1.topic_id AND t3.tag = 'c'
INNER JOIN tags_table AS t4
ON t4.topic_id = t1.topic_id AND t4.tag = 'd'
WHERE t1.tag = 'a'

我认为应该代表 (a AND b) OR (c AND d)。注意COALESCE,因为完全外连接 t1.topic_id 可能为空。

【讨论】:

    【解决方案4】:

    这是一个 Rails 解决方案,可为 AND 案例创建自引用连接,并为 OR 案例创建简单的 SQL 包含。该解决方案假定一个名为 TopicTag 的模型和一个名为 topic_tags 的表。

    类方法 Search 需要 2 个参数,一个标签数组和一个包含“and”或“or”的字符串

    class TopicTag < ActiveRecord::Base
    
      def self.search(tags, andor)
    
        # Ensure tags are unique or you will get duplicate table names in the SQL
        tags.uniq!
    
        if andor.downcase == "and"
          first = true
          sql = ""
    
          tags.each do |tag|
            if first
              sql = "SELECT DISTINCT topic_tags.topic_id FROM topic_tags "
              first = false
            else
              sql += " JOIN topic_tags as tag_#{tag} ON tag_#{tag}.topic_id = \
                       topic_tags.topic_id AND tag_#{tag}.tag = '#{tag}'"
            end
          end
          sql += " WHERE topic_tags.tag = '#{tags[0]}'"
          TopicTag.find_by_sql(sql)
    
        else
          TopicTag.find(:all, :select => 'DISTINCT topic_id', 
              :conditions => { :tag => tags})
        end
      end
    
    end
    

    为了获得更多的测试覆盖率,数据被扩展为包含一个额外的国际象棋记录。使用以下代码为数据库播种

    [1,2].each   {|i| TopicTag.create(:topic_id => i, :tag => 'football')}
    [1,3].each   {|i| TopicTag.create(:topic_id => i, :tag => 'cricket')}
    [2,3,4].each {|i| TopicTag.create(:topic_id => i, :tag => 'basketball')}
    [4,5].each   {|i| TopicTag.create(:topic_id => i, :tag => 'chess')}
    

    下面的测试代码产生了显示的结果

    tests = [
      %w[football cricket],
      %w[chess],
      %w[chess cricket basketball]
    ]
    
    tests.each do |test|
      %w[and or].each do |op|
        puts test.join(" #{op} ") + " = " + 
          (TopicTag.search(test, op).map(&:topic_id)).join(', ')
      end
    end
    
    足球和板球 = 1 足球或板球 = 1、2、3 国际象棋 = 4, 5 国际象棋 = 4, 5 国际象棋、板球和篮球= 国际象棋、板球或篮球 = 1、2、3、4、5

    使用 SqlLite 在 Rails 2.3.8 上测试

    编辑

    如果你想使用 like 那么OR 的情况也会稍微复杂一些。您还应该注意,如果您正在搜索的表大小不一,使用带有前导 '%' 的 LIKE 可能会对性能产生重大影响。

    以下版本的模型对这两种情况都使用 LIKE。

    class TopicTag < ActiveRecord::Base
    
      def self.search(tags, andor)
    
        tags.uniq!
    
        if andor.downcase == "and"
          first = true
          first_name = ""
          sql = ""
    
          tags.each do |tag|
            if first
              sql = "SELECT DISTINCT topic_tags.topic_id FROM topic_tags "
              first = false
            else
              sql += " JOIN topic_tags as tag_#{tag} ON tag_#{tag}.topic_id = \    
                      topic_tags.topic_id AND tag_#{tag}.tag like '%#{tag}%'"
            end
          end
          sql += " WHERE topic_tags.tag like '%#{tags[0]}%'"
          TopicTag.find_by_sql(sql)
    
        else
          first = true
          tag_sql = ""
          tags.each do |tag| 
            if first
              tag_sql = " tag like '%#{tag}%'" 
              first = false
            else
              tag_sql += " OR tag like '%#{tag}%'" 
            end
          end
          TopicTag.find(:all, :select => 'DISTINCT topic_id', 
                :conditions => tag_sql)
        end
      end
    
    end
    
    tests = [
      %w[football cricket],
      %w[chess],
      %w[chess cricket basketball],
      %w[chess ll],
      %w[ll]
    ]
    
    tests.each do |test|
      %w[and or].each do |op|
        result = TopicTag.search(test, op).map(&:topic_id)
        puts ( test.size == 1 ? "#{test}(#{op})" : test.join(" #{op} ") ) + 
             " = " + result.join(', ')
      end
    end
    
    足球和板球 = 1 足球或板球 = 1、2、3 国际象棋(和)= 4, 5 国际象棋(或)= 4, 5 国际象棋、板球和篮球= 国际象棋、板球或篮球 = 1、2、3、4、5 国际象棋和 ll = 4 国际象棋或 ll = 1、2、3、4、5 ll(and) = 1, 2, 3, 4 ll(或) = 1, 2, 3, 4

    【讨论】:

    • 我刚刚意识到您想在标签上使用 Like。我会重新审视这个